CognitionSearch™ is a linguistic meaning-based Search engine. It searches for meaning in documents, rather than searching for text string patterns - which is how most traditional Search engines, such as Google, Yahoo!, MSN, ASK, work.
In order to demonstrate the inherent power and value of CognitionSearch and its advantages over traditional statistical pattern-matching Search engines, we have created this comparison Website to enable anyone to query the Wikipedia dataset and then compare the results of that query between CognitionSearchTM and a traditional pattern-matching engine. As a proxy for the traditional Search engine, we have implemented Apache Lucene, an open-source statistical pattern-matching Search engine using its default settings.
The results of the comparison are presented in a number of ways:
- Graphically using a Venn Diagram, illustrating the overlap of the query results;
- In various detailed descriptions, each showing the list of results that either CognitionSearch™, Lucene or both returned.
The user can query the Wikipedia dataset with his/her own query, or use one of the Sample Queries provided. He or she can then evaluate the effectiveness of CognitionSearch over that of a traditional pattern-matching Search engine. The Wikipedia dataset we are using is current as of January 8, 2008.
CognitionSearch
This evolutionary software uses state-of-the-art computational linguistic technology to easily and precisely find on-target information from digitized text. CognitionSearch can be applied to datasets on the Internet, within organizational data repositories or within applications which require semantic analysis or Search capabilities. Users pose queries in plain English and CognitionSearchTM interprets their meaning -- responding with more precise results than is possible with traditional Search technologies (e.g. pattern matching, concept search, etc.). CognitionSearch produces results which are both highly relevant to the user and very complete. This increased relevancy and completeness (otherwise knowns as "Precision" and "Recall") is much higher than is possible with traditional Search technologies no matter how the user query is worded.
Since CognitionSearch searches on meanings and not text patterns, Search Precision is dramatically increased because of CognitionSearch's "understanding" of word disambiguation . When a user poses queries in plain English, CognitionSearch determines what the words in the query mean in the context of the query. If you ask "How can I buy stock on the market?", CognitionSearch determines that "stock" means "share" or "security". It searches only on that meaning of "stock" and doesn't retrieve information about "stocking shelves", "cattle" or "flowers". CognitionSearch returns information with over 90% Precision, reducing the users need to ponder large numbers of irrelevant retrievals found with other Search technologies. CognitionSearch's dictionary has over 506,000 word meanings.
Simultaneously (which is unique to our technology), CognitionSearch has far greater Recall, and overcomes the problem of information underload, i.e. not finding anything at all because of differences in wording. It finds information regardless of the way a concept is worded in the target documents. If you ask "Fatal fumes in the workplace?", CognitionSearch finds documents that talk about "gas", "vapor", "steam", etc. terms which were not in the user's original query. CognitionSearch has over 75,000 synonym classes (thesaural groups). It is important to note that some of the words in the example above have ambiguous meanings (e.g. "fume" can mean "a vapor" or it can mean "to be angry"), but CognitionSearch doesnt retrieve irrelevant information triggered by those words used in a different meaning than the query. The result is that CognitionSearch retrieves 5 to 7 times more relevant information than other Search technologies, as measured in head-to-head comparisons with other Search engines, while maintaining over 90% Precision.
Another source of greater Recall is the software's taxonomy, which enables CognitionSearch to search on specific information when queried on more general information. As an example, if the user searches on "money", the software will find information about "dollar", "pound" and "yen", etc. CognitionSearch's taxonomy covers 506,000 concepts, and is thus very complete. The customer doesn't have to build a taxonomy from scratch, as with other Search technologies.
CognitionSearch has greater recall, in part, because it can find the base stem of words that have been altered by regular morphology rules, so it knows that the base stem of "babies" is "baby", and the base stem of "caught" is "catch". In addition, it computes the stem from derived words, so it knows that "prescreen" has base stem "screen". CognitionSearch recognizes millions of inflected and derived words.
Precision is improved by the presence of over 191,000 phrases of English included within CognitionSearch. Thus the software does not confuse the parts of a phrase for the phrase itself. As an example, in response to a query with "Bill of Rights", it doesn't retrieve to a document talking about the right to bill a customer, or other uses of those words. Phrases exist in synonym classes, so that the spell-out of acronyms is mapped to the acronyms. For example, "SEC" is mapped to "Securities and Exchange Commission", and "OCD" to "obsessive compulsive disorder".
The combination of all of these linguistic techniques and semantic databases create a powerful tool for searching on meaning, and that makes the user experience far more efficient, delivering just the right information and all the right information in response to user queries.
Apache Lucene
For the purposes of this comparison,we use Apache Lucene, a widely-available open-source Search program, as a proxy for a traditional search engine. Unfortunately, due to the way traditional Search engines, such as Google, Yahoo!, Ask and others return and display their results, we are unable to use them directly for this comparison (i.e. it would not be an apples-to-apples comparison and would unfairly favor CognitionSearch). Lucene is a popular, robust, and high-performance full-text index and Search application. Although Lucene alone is not a Web Search engine in its own right, we use its functionality as a demonstrative example of traditional full-text statistical pattern-matching indexing and search.
In order for the comparison tests to be unbiased, Lucene has not been specially configured for this application, either to aid or hinder its performance. We are using its default settings:
- For parsing user-provided input, the Comparison application uses Lucene's standard QueryParser.
- No special settings, for example with respect to similarity or phrase slop, are set for the QueryParser.
- During indexing, the default merge factor of 10 is used.
- No special boosting of terms is applied.
- No custom implementations of scoring have been implemented.