Medical World Search has three components: the
Web crawler, the indexer, and the query processor.
The Web crawler seeks out medical sites on the
World Wide Web, starting from some of the major
entry points for clinical medicine, then retrieves
them and stores them on Medical World Search's
disk system. The indexer recognizes medical
concepts in the pages retrieved by the Web crawler,
and generates a large index of all medical concepts
and words in the Web pages; this index shows in which
pages each concept and word appears. The query
processor allows the user to specify his information
needs and then attempts to match the query
optimally to Web pages using the index generated
previously. Results are ranked and returned to the
user.
The Web Crawler
Medical World Search's Web crawler uses a
combination of automated retrieval of Web sites and
manual selection to retrieve and store only Web
pages that have valuable clinical information. This
task is made easier by the ability to determine the
presence of medical concepts in pages and rank the
importance of the page accordingly.
The Indexer
Medical World Search's indexer uses an optimized
algorithm to recognize over 200,000 different
medical concepts in Web pages. The index built by
the indexer contains nearly all words and medical
concepts present in a Web page. Indexing by
medical concepts allows the indexer to represent, for
instance, heart attack and myocardial infarction as
the same concept. These medical concepts are
represented in the index with the full information
about their relationships. For instance heart attack is
represented as a heart disease, allowing easy
searching for all documents about heart diseases.
The Query Processor and Results Ranking
The query processor is the portion of Medical World
Search with which medical professionals interact
directly. The user interface allows users to easily
specify the desired query, as a combination of
medical concepts and words. Boolean queries can
be formulated. By default, more specific terms are
directly added to the search, so that a query on heart
diseases will result on searching for heart attack,
coronary artery disease, angina pectoris, and so
forth. But the user can specify not to add the more
specific terms.
When a user of Medical World Search submits the
query, medical concepts and words are quickly
looked up in the index generated previously. The
result is a list of Web pages containing medical
concepts and words in the query.
Medical World Search then ranks the Web pages by
order of importance to the user's query. Here,
knowledge about medical concepts and their
relationships is used for optimal ranking, as well as
the number of times a medical concept appears in
the page and the length of the page.
The Medical Intelligence
Medical World Search has knowledge of over
500,000 medical terms including relationships
between these terms, such as synonyms, more
specific or more general terms, and definitions.
Users of Medical World Search can easily browse
and search this knowledge base. Indeed, Medical
World Search incorporates the medical thesaurus
developed by the National Library of Medicine
(NLM) as part of their Unified Medical Language
System (UMLS) project. The UMLS thesaurus
integrates disparate medical vocabularies, such as
the NLM's own Medical Subject Headings (MeSH),
the International Classification of Diseases (ICD-9-
CM), the Systematic Nomenclature for Medicicine
(SNOMED), and the Current Procedural
Terminology (CPT).