About Us | Add Links | Advertising | Articles | Calendar | City info | Contact us | Home| Hot Links | Search Engines












Where did the spider and the web come from?
How Do Search Engines Really Work?

by Suzon Walton
Publisher of Connected Now

Connected Now currently does not list every business in Sacramento, because we are comprised of a database of paid subscribers (those who wish to pay to be listed). We are much like a Google's sponsored advertiser. We do offer tips and tricks to find what you are looking for, while we wait for all the remaining businesses in the area who have websites jump on board. Search Engines for the general web do not really search the World Wide Web directly. Each one searches a database of the full text of web pages selected from the billions of web pages out there residing on servers. When you search the web using a search engine, you are always searching a stale copy of the real web page. When you click on links provided in a search engine's search results, you retrieve from the server the current version of the page.

Search engine databases are selected and built by computer robot programs called spiders. Although it is said they "crawl" the web in their hunt for pages to include, in truth they stay in one place. They don't have eight legs but millions. They find the pages for potential inclusion by following the links in the pages they already know about. They cannot think or type a URL or use judgment to "decide" to go look up something too see what's up. They are still brainless and only do what the humans type (for now).

If a web page is never linked to in any other page, search engine spiders can't find it. The only way a brand new page - one that no other page has ever linked to - can get into a search engine is for its URL to be sent by some human to the search engine companies as a request that the new page be included. All search engine companies offer ways to do this. The process of sending this info is commonly called "search engine submission". Experts in this field of submitting your web pages can command a high price for the job. Computer programs are now making this easy. We will cover search engine submission next month's issue of Connected Now.

After spiders find pages, they pass them on to another computer program for "indexing." This program identifies the text, links, and other content in the page and stores it in the search engine database's files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content. Some types of pages and links are excluded from most search engines by policy. Others are excluded because search engine spiders cannot access them. Pages that are excluded are referred to as the "Invisible Web" -- what you don't see in search engine results. The Invisible Web is estimated to be two to three or more times bigger than the visible web.

Google has one of the largest databases of Web pages, including many other types of Web documents (e.g., PDFs, Word, Power Point or Excel documents). Despite the presence of many advertisements and considerable clutter, Google's popularity ranking often makes pages worth looking at rise near the top of search results. Google alone is not sufficient, however. Less than half the searchable Web is fully searchable in Google. Overlap studies show that about half of the pages in any search engine database exist only in that database. Getting a second opinion is therefore often worthwhile. For a second opinion, we recommend Teoma (www.teoma.com/), Vivisimo (www.vivisimo.com/)-a meta-search engine that indirectly searches three huge search engine databases) or Yahoo! Search (www.yahoo.com/).

YAHOO
Yahoo! is one of the best known and most popular Internet portals. Originally just a subject directory, it now is a search engine, directory, and portal. To go to the Yahoo! portal and main starting point, use www.yahoo.com. For direct access to the search engine, use search.yahoo.com and for the directory use dir.yahoo.com.

    Strengths:
  • A very large, new (as of Feb. 2004) search engine database
  • Includes cached copies of pages
  • Also includes links to the Yahoo! directory
  • Supports full Boolean searching
    Weaknesses:
  • Lack of some advanced search features such as truncation
  • Only indexes first 500 KB of a Web page (still more than Google's 101KB)
  • Link searches require the inclusion of the http://
  • Includes some pay for inclusion sites

TEOMA
Debuting in Spring 2001 and relaunching in April 2002, this new search engine has built its own database and offers some unique search features. It was bought by Ask Jeeves in Sept. 2001. It lacks full Boolean and other advanced search features, but in has more recently expanded and improved its search features and added an advanced search.

    Strengths: Identifying metasites
  • Refine feature to focus on Web communities
    Weaknesses:
  • Smaller database
  • No free URL submission
  • No ability to uncluster results to easily see more than two hits per site
  • No cached copies of pages

GOOGLE
Google has become for many the favorite Web search engine for the masses. Since Feb. 1999, GoogleTM it has made its mark with its relevance ranking based on link analysis, cached pages, and aggressive growth. Today, it's boasts over 4+ billion indexed pages, unindexed URLs, and other file formats.

    Strengths:
  • Size and scope: It is now the largest, and includes PDF, DOC, PS, and many other file types
  • Relevance based on sites' linkages and authority
  • Cached archive of Web pages as the looked were indexed
  • Additional databases: Google Groups, News, Directory, etc.
    Weaknesses:
  • Limited search features: no nesting, no truncation, does not support full Boolean
  • Link searches must be exact and are incomplete
  • Only indexes first 101 KB of a Web page and about 120 KB of PDFs
  • May search for plural/singular, synonyms, and grammatical variants without telling you

MSN Search is one of the search engine for the MSN portal site. It uses an Inktomi database. The basic search screen only shows a few options, but by choosing the Advanced Search link, the full range of search features is displayed. This review discusses the full set of options, some of which are only available in the Advanced Search. Use the table of contents on the left to navigate this review.

Databases: MSN Search uses LookSmart for its directory and Inktomi for its search engine database. Its sponsored sites (ads) are from Overture. MSN Featured Sites and Directory results come first from the basic search screen. The Advanced Search only displays Inktomi results. Before Sept. 2002, is used to include a link to Direct Hit results. In the Featured Sites section, results may come from MSN destination sites, MSN Encarta, and/or MSN ad partners. Note that MSN will not retrieve adult content, and that searches on terms such as 'sex' will give no results but will link to an adult search engine.

HotBot, owned by Terra/Lycos, is one of older Web search engines. Originally it just used the Inktomi database and then added Direct Hit and the Open Directory.

    Strengths:
  • Advanced searching capabilities
  • Quick check of three major databases
  • Advanced search help
    Weaknesses:
  • Does not include all advanced features of each of the four databases
  • No cached copies of pages
  • Only displays a few hits from each domain with no access to the rest in Inktomi
  • Same ads at the top push regular results below the fold
  • Should have a file type limit for PDF, MS Word, PowerPoint, and Excel files

Contact the Suzon Walton at suzon@connectednow.com


Type Business / Category

Type Business Name

About Us | Add Links | Advertising | Articles | Calendar | City info | Contact us | Home| Hot Links | Search Engines