| Concept | Definition | Wikipedia |
|---|
| Clustered search | Search results are grouped together by "clusters," which are search results organized into categories. These clusters are created dynamically at the time of the search query and allow drill down and navigation | Clustering |
| Conceptual Search | Search retrieval of documents based upon a combination of keywords and conceptual matching. Documents are automatically classified to determine the concepts to which they belong. | Latent semantic analysis |
| Faceted search | Enables a user to navigate information hierarchically, going from a category to its sub-categories, but choosing the order in which the categories are presented. | Informative Faceted Searching (IFS) |
| Federated search | Also known as meta-searching or cross-database searching, it is a technology that allows users to search many networked information resources from one interface. | Federated search |
| Gatherer | Also called indexer or crawler. the crawling component of the Search service. The purpose of the gatherer is to crawl content sources, extract the source’s data, and break that data down so that it can be placed in an index and searched. | |
| Gatherer / Indexer | The gatherer / indexer is the service that controls the crawling process. | |
| ifilters / filters | The filter's task is to extract a stream of textual information from a document, discarding all non-textual and formatting information so that it can be added to the search index.
| |
| Image search | Information retrieval designed to help the user find images, pictures, animations, etc. | Image search engine |
| Intent-driven search | Search that uses machine learning technology to give you a choice. | |
| Linguistic search | Linguistic analysis involves finding word boundaries (word-breaking) and conjugating verbs (stemming). | |
| Meta-search | The combining of results from multiple search engines. | Metasearch engine |
| Natural Language Processing (NLP) | Algorithms that allow a search to process and understand human languages. | Natural Language Processing |
| Near duplicate algorithms | .. | |
| Ontology | The categories of things within a domain. | Ontology (computer science) |
| Personalized search | A search interface that "reads your mind" and personalizes itself around your preferences and the sites you visit. | Personalization |
| Probabilistic model search | Estimates the probability that the user will find a particular document relevant. | Information retrieval |
| Protocol handlers | A protocol handler can access data over a particular protocol or from a particular store. Common protocol handlers include the file protocol, Hypertext Transfer Protocol (HTTP), Messaging Application Programming Interface (MAPI), and HTTP Distributed Authoring and Versioning (HTTPDAV). The protocol handler processes URLs passed to it by the gatherer | |
| Proximity search | A search where users to specify that documents returned should have the words near each other. | Proximity search (text) |
| Relevance feedback | Take the results that are initially returned from a given query and use information about whether or not those results are relevant to perform a new query. | Relevance feedback |
| Search algorithm | The defined set of rules put in place by a search engine to measure and sort the web page listings that will be displayed in response to a search query. | Search algorithm |
| Search precision | The ratio of the number of relevant documents retrieved to the total number of documents retrieved. | Information retrieval |
| Search ranking | A search engine's attempt to provide the "best" results first. | Search engine |
| Search Relevancy | How well a document provides the information a user is looking for, as measured by the user. | Relevance (information retrieval) |
| Stemming | A technique for reducing words to their grammatical roots. | Stemming |
| Taxonomies | A set of of agreed-upon terminologies and principles of classification. | Taxonomy |
| Vector space model for search | A classic model of document retrieval based on representing documents and queries as vectors of index terms. | Vector space model |
| Word breakers and stemmers | A word breaker is a component that determines where the word boundaries are in the stream of characters in the query or in the document being crawled. A stemmer extracts the root form of a given word. For example, "running," "ran," and "runner" are variants of the word "run." In some languages, a stemmer expands the root form of a word to alternate forms
| |
| Word Breaking | Finding word boundaries in linguistic analysis. | |