Everything the crawler finds goes into the second part of the search engine: the index. The indexer takes every word on a web page, logs it, categorizes it and then stores the results in a huge database. Indexing every word allows most search engines to go beyond simple keyword searches and allows proximity searching for words close to each other.
Some indexers also index the HTML coding which allows the search engine to look by web page categories like URLs or titles. Most special searching features can be utilized in the advanced search areas of nearly all the major search engines. The Help section of every search tool will show you how to get maximum results from that specific search engine.
There is a time lag from when a web page is crawled to when it is indexed. Until it is indexed, it is unavailable to search engine users, which means it exists in their system, but is not yet accessible by you. This is why you have to be skeptical of some of the boasts of search tools. As an example, when Google announced in February, 2004, that it had increased its total number of pages to 4.28 billion, it did not mention that a portion of those results were un-indexed pages. Yes, you still had access to billions of Google's pages, just not all 4.28 billion!
If a crawler finds changes on a web page, then it updates the index to include the new information. The word "index" implies categorization and classification - activities that require human assessment and interpretation. In reality, the indexing for a search engine is done by computer (software, actually), and the rankings of the responses, or hits, are calculated by mathematical formulas as well.
To improve performance, many search engines eliminate certain common words like "is," "and," "or," and "of." These are called "stop words" that add no real benefit to the search. Search engines also have taken other steps to focus their searches by eliminating punctuation and converting all letters to lowercase. It is important to remember that each search engine has different rules and ways of working.
The third part of a search engine is its query processing capability, the complicated part of the process. What happens is the query is taken by the search engine, the index is searched, and all kinds of different factors are weighed in deciding what is relevant, what is not, all before the results are returned. The exact process differs with every search engine and the search engine companies closely guard the specific mathematical algorithms used to make their calculations. The big difference is the way relevance is calculated.
Our website is not responsible for the information contained by this article. Articleinput.com is a free articles resource thus practically any visitor can submit an article. However if you notice any copyrighted material, please contact us and we will remove the article(s) in discussion right away.
Note: This article was sent to us by: Ryan Heidels at 08272010
1. Special portals of search engines are known as vortals
All articles are property of their respective authors. Please read our Privacy Policy!
© 2009 ArticleInput.com.
Partners: Damenmode