HOW SEARCH ENGINES WORK

By Dave Wallace

RIVER BENDER - May, 2000

Search engines are one of the great wonders of the Internet. Being a programmer eons ago, it is baffling to me how fast and well they work. In earlier days, search and sorting algorithms took an awful long time to perform and programmers were always looking for faster ways to speed up their tasks. But with today's techniques and faster personal computers searching on the Internet has become almost instantaneous and a real pleasure

A search engine, or a web-searching program as they were called in earlier years, is simply a web site that permits one to enter keywords to search the World Wide Web. There are literally hundreds of search engines, many of which specialize in searching particular subject categories and some charge fees but the most popular ones are free and cover all subjects. Most are supported by advertisements.

Here are the addresses of some of the more popular search engines:

http://www.lycos.com/

http://www.altavista.com/

http://www.yahoo.com/

http://www.metacrawler.com/index.html

How do search engines work? They are quite complicated and get into heuristic or speculative formulation in their programming. Some have even patented their search techniques. Basically they operate in three phases: (1) discovery and creation of a database, (2) user input of search words and search of the database and (3) presentation of the results.

The first phase before a search engine can assist you is to use software programs called "spiders" or "robots" that traverse the web retrieving documents and all documents referenced or linked. Specifically how this is done is dependent on the particular search engine. Some spiders search more than the web and retrieve information on FTP (File Transfer Protocol) sites and thousands of USENET Newsgroups. The information obtained, including requests from users for their web sites to be registered, is entered into the search engine's database. This is an ongoing activity that occurs before you ever access the search engine.

The next phase is the user phase where you come in. You access the search engine's web page and enter key words for the information you're interested in. One learns through trial and error how to present keywords and phrases that achieve maximum hits on information you're looking for. Since each search engine has its specific rules one should always read its help instructions on entering search words.

After receiving your search words the search engine looks in its database for your subject. Indexing of the database is very important just as card indexing is important in a library. Alta Vista is said to have indexed over 30 million web pages. Others may have fewer pages but index the full text of those pages. What goes into the database from the spiders and the indexing is what makes search engines different. Sometimes you need to employ several search engines to make a successful search.

The last phase of a search engine's process is the presentation and ranking of the results found. Sometimes one may find a great search engine but the presentation of the results is poor. For example, it might display far more hits than you could ever wade though while another search engine might have fewer hits but ranks them in order of relevance to your search words. It pays to compare search engines. Even if you find one you don't like, try it later because they're always improving and trying to outdo one another.

A "meta search engine" is a different breed from engines like Lycos or Alta Vista. Metacrawler, mentioned above, is a meta search engine that employs eleven of the most popular search engines at the same time and presents their results in the order of relevance. Although Metacrawler is my favorite search engine I'm sure others have their favorite

Here are some pointers about search engines:

Most search engines are not limited to finding web sites that have been registered with the search engine. In other words, their spider or robot may pick up your personal web site that is not registered anywhere. For example, I once put my name in a search engine that came up with a web page that was linked from the NBCUG web page that listed past speakers, of which I was one. That suggests that you should never put anything on a web page that you don't want the world to see. A web page not registered provides no protection from spiders gathering information.

Registering your web site with a search engine, however, does improve hits by a search engine. If one registers with several popular search engines, it is questionable whether paying to have a company register your site with fifty or so search engines becomes worthwhile. Besides, as pointed out, the search engine spiders will probably pick up your site anyway.

The design of the web page also helps it being picked up by a spider. Meta-tags in the Hypertext language that provide keywords defining the content of your web site may help but are not absolutely required. The same keywords if used repeatedly on the main page will likely accomplish the same result.