How does the indexer of a search engine actually work


What is the search engine indexer?

Everything the crawler finds goes into the second part of the search engine: the index. The indexer takes every word on a web page, logs it, categorizes it and then stores the results in a huge database. Indexing every word allows most search engines to go beyond simple keyword searches and allows proximity searching for words close to each other.

Some indexers also index the HTML coding which allows the search engine to look by web page categories like URLs or titles. Most special searching features can be utilized in the advanced search areas of nearly all the major search engines. The Help section of every search tool will show you how to get maximum results from that specific search engine.

There is a time lag from when a web page is crawled to when it is indexed. Until it is indexed, it is unavailable to search engine users, which means it exists in their system, but is not yet accessible by you. This is why you have to be skeptical of some of the boasts of search tools. As an example, when Google announced in February, 2004, that it had increased its total number of pages to 4.28 billion, it did not mention that a portion of those results were un-indexed pages. Yes, you still had access to billions of Google's pages, just not all 4.28 billion!

If a crawler finds changes on a web page, then it updates the index to include the new information. The word "index" implies categorization and classification - activities that require human assessment and interpretation. In reality, the indexing for a search engine is done by computer (software, actually), and the rankings of the responses, or hits, are calculated by mathematical formulas as well.

To improve performance, many search engines eliminate certain common words like "is," "and," "or," and "of." These are called "stop words" that add no real benefit to the search. Search engines also have taken other steps to focus their searches by eliminating punctuation and converting all letters to lowercase. It is important to remember that each search engine has different rules and ways of working.

Query Process

The third part of a search engine is its query processing capability, the complicated part of the process. What happens is the query is taken by the search engine, the index is searched, and all kinds of different factors are weighed in deciding what is relevant, what is not, all before the results are returned. The exact process differs with every search engine and the search engine companies closely guard the specific mathematical algorithms used to make their calculations. The big difference is the way relevance is calculated.

Legal Disclaimer

Our website is not responsible for the information contained by this article. Articleinput.com is a free articles resource thus practically any visitor can submit an article. However if you notice any copyrighted material, please contact us and we will remove the article(s) in discussion right away.

Note: This article was sent to us by: Ryan Heidels at 08272010

Related Articles

1. Special portals of search engines are known as vortals
What are vortals? An even newer trend in search engines is the development of specialty portals, commonly called vertical market portals or vortals. They are ve...

2. Paid placement is just one of the ways a search engine makes money
Before you could not buy your way to the top of a search engine. In 2000 a company called GoTo.com, looking for a different revenue model, started to sell placement in i...

3. Tips and hints on how AlltheWeb works and what features it has
AlltheWeb Yahoo! purchased AlltheWeb from the Norwegian-based FAST company in 2003. At the time, AlltheWeb ...

4. Google search tips and why it is the most popular search engine
Google Google is the pre-eminent search engine and the largest, expanding the number of pages it indexes in ea...

5. A few minor weaknesses Google has that most searchers ignore
Google weaknesses First, as Google became larger and more well-known, the quality of its results seems to have slipped a notch. Perhaps this is because website ...

6. Teoma and WiseNut strengths and weaknesses
Teoma Teoma, which debuted in the Spring, 2001, is definitely one of the up-and-coming search tools. It was p...

7. Short list of the best search tools online today
AOL search AOL Search provides users with editorial listings that come from Google's crawler-based ...

8. Taking a closer look at three meta search engines
Dogpile Dogpile is the most popular meta-search engine, serving up results from Google, Yahoo!, Ask Jeeves,...

9. Search engine ad campaign tips and infos
You want to provide a high quality product and service to your customers. Maintaining a high-quality level keeps your customers pleased, increases your retention rates,...

10. How Google keeps its users satisfied with the best search results
Google searchers aren't always in buy mode. Oftentimes they're looking for something to entertain them, inform them, or navigate them. But search, and Google in parti...