Crawlers and the way they work


In the early 1990s, search "robots," now better known as "crawlers" or "spiders," were developed. These were computer-generated and did not need people to help them locate and index content. 1994 was the watershed year for the Web's takeoff. In addition to the first sophisticated crawlers, two Stanford University graduate students used crawlers to find links and then hand-selected them and built the directory that now Yahoo!. The next year, the first search engines appeared - Infoseek, AltaVista, and Excite - with each offering different things they could do.

Search engines, compiled by computers, and subject directories, compiled by human beings, were developed to find and index documents and to point you to the most relevant documents in response to your keyword query. That worked initially because the Internet pages were mostly text and were simple hypertext markup language documents.

Quickly, web pages developed with information available in many formats, including sound and video. The search tools began to fall behind in keeping up with both the Web's impressive growth and the ability to recognize and index non-text information, like graphics.

These pages that fall through the cracks are part of what Price and Sherman describe as the "invisible web," an area that is growing considerably larger than the huge growth of the Internet. A Cyveillance.com study of July, 2001 estimated the size of the Internet at 2.5 billion documents and growing by a rate of 7.5 million documents per day. Another study, by Bright Planet.com, estimated the number of pages not indexed by search tools to be a whopping 400 to 550 times larger than what is already indexed. Price and Sherman and others believe the Bright Planet study is overstated, speculating that the "invisible web" is two to fifty times larger than the visible web. They have developed methods and innovative ways to search for "invisible web" resources.

Browsing is the process of following a series of hypertext links, pointing and clicking your way through a collection of documents. It is good way to look through a limited amount of information on a particular subject, but it is an inadequate method if you are looking at a huge number of documents. Searching relies on software that matches keywords you specify in order to locate the most relevant documents in its index.

So, when you are looking at an organized category of information, browsing can be a useful technique. If you are looking at a large number of documents in an unorganized way, then searching is a more efficient method of finding information.

This is important because search tools use two different methods to help you find what you are looking for. Subject directories, organized by human beings into hierarchical categories, are a great way to study a small number of subjects in its proper context. You look at one category and within it are several sub-categories of that subject. Search engines, by contrast, are organized by computers using keywords or phrases, and offer no context but allow you to research large numbers of subjects. They have no hierarchical structure. Instead they are organized by a search engine using mathematical formulas and algorithms to find relationships and compute correlations between subjects. In searching, documents judged by the computer to have the most relevance are presented first in an indexed list.

Both techniques - browsing and searching - are very important in your efforts to find what you are looking for quickly and efficiently. Despite the increasing sophistication of internet search tools, none of the thousands of search tools in existence can keep up with the mushrooming number of pages on the Internet. A 1999 study found that, at best, the top search tools indexed about fifteen percent of the sites available. No new studies have been done, but search experts speculate that the figure is about thirty percent these days.

As it stands now, no single search tool comes close to indexing most of the Internet. The level of sophistication of the search tools continues to increase, but is dwarfed by the numbers of pages being added daily to the Internet. And the search engines have little overlap. Greg Notess, a highly respected search engine guru, found that only rarely did any of the major search engines come up with the same findings and there was surprisingly little overlap between the major search engines and directories. Notess' tests, which can be found on his excellent website SearchEngineShowDown.com, were done using obscure words, which give a more accurate figure of how much overlap truly exists.

This overwhelming, exponential growth is the source of a lot of frustration for would-be researchers. By the time you have devised your own methods for getting around the Internet, things will have changed. Keeping up is, in itself, a full-time job. Thankfully, there are rules of thumb that will stand you in good stead. While online research can be frustrating, it can also be fascinating. As we all know, there is usually more than one way to reach an intended destination. If you are stymied in one direction, the trick is to devise another way to get there.

Legal Disclaimer

Our website is not responsible for the information contained by this article. Articleinput.com is a free articles resource thus practically any visitor can submit an article. However if you notice any copyrighted material, please contact us and we will remove the article(s) in discussion right away.

Note: This article was sent to us by: Shayne Roys at 08232010

Related Articles

1. New uses of the Internet emerge almost daily
What do we use the Internet for? We can build a tidy list of actual and possible uses, though this is a notoriously time-sensitive list, as new uses pop up with amazing (or...

2. Social networking sites change the use of the Internet
Sites like MySpace, YouTube or Blogger have each in their own ways changed the ways users make use of the Internet and the World Wide Web. They are part of some broader tra...

3. Opinions about the Internet and the World Wide Web
As people begin to write not only the histories of the Internet and the World Wide Web but also the histories of academic study of the Internet and the World Wide Web, so t...

4. The influence of social networking on self identity
The current situation means that services and sites thought of as culturally significant within mobile digital culture - YouTube or Flickr, for example - can involve the up...

5. The history of search engine optimisation
SEO writing The term of SEO means "Search Engine Optimization". It appreared in 1990 and it is a branch of online marketing strategy. It is used to improve eith...

6. Become visible on Google by folowing these tips
Google Visibility Strategy Hopefully at this point, you've gone through the personal branding process and you know the following: Your personal br...

7. How to create a YouTube video to promote yourself
Often, when people are asked to develop a YouTube video, the first question is usually centered on what kind of video to produce. Erin Blaskie gave the following suggesti...

8. How to create a custom Twitter page and use other Twitter features
Creating a Custom Twitter Page You have now registered and gone through the steps setting up some of the settings for your Twitter account. Next, you want a bac...

9. Tips for getting more Twitter followers and managing your tweets
I Need Followers, Now What? Let me say that before you start trying to build a following, you will want to tweet out about 10–20 on-brand, relevant tweets...

10. What are Twitter directories and how to use them
Finding people to follow is really simple and is one of the easiest ways to start getting followers for you. You can google the term, "Twitter Directories" and you'll c...