4 What's going on

4.4 Quality, not quantity

4.4.1 Ranking

It is common for a web search to return hundreds, or even millions, of hits; certainly too many to check. But uncannily, the first few hits often contain just what you were looking for. How do search engines manage this seemingly miraculous feat?

The answer lies in techniques used to rank pages so that the ‘best’ are first. Each search engine uses different techniques, tweaks them continually, and guards the details jealously. But there are some general principles.

One approach is to use information that is on the original page. For example, greater weight can be attached to words that appear in the page title or in headings, or when a word occurs frequently on a single page.

A different approach is to include some human activity, albeit indirectly. For example, Google weights a page more heavily if it finds that other pages have links to it. It is making the assumption that people only create links to pages that have proved useful. This can be applied in a circular manner, so links are worth more if they are from pages that are themselves highly ranked. Another method is to note whenever a page is chosen from the results list and weight that page more heavily; after all, if one person thought it useful it is likely that someone else will.

Most search engines also combine information from their directory and spidered index. Pages that appear in the directory can be given additional weight and so appear near the top of the hit list in a search. Google applies ranking to its directory, so not only are sites in the directory manually chosen, but they are ranked in order using the techniques above.

Activity 28

Why do you think search engines are reluctant to reveal details of their ranking techniques?

4.4.2 Query rewriting

Some search sites provide additional features that help you to refine your search. For example, the search site can keep a log of all the searches that people make. When you enter a search, the search engine can suggest searches that were similar.

Activity 29

Go to AltaVista and search for a topic that interests you. Look at the related queries. Are any of these useful to extend your search more widely?

Another possibility is to extend your search from a page that has proved useful. For example, Google offers a ‘similar page’ search.

A screen dump of part of a Google results page. The ‘Similar pages’ link is circled in one of the results- ‘The Dian Fossey Gorilla Fund International. The Dian Fossey Gorilla Fund International is dedicated to the conservation and protection of the endangered mountain gorilla and its habitat in East Central … www.gorillafund.org/ – 7k – Cached – Similar pages.’

[www.google.com]

Another approach is to make use of the directory's categories. For example, a hit found from a full-text search that also appears in the directory may be shown with a link to the appropriate directory category. By following this link you can see pages that researchers have put in the same category.

[www.yahoo.com]

Search sites are continually looking for ways in which to improve the quality of their results, so we can expect new techniques to appear.

Activity 30

You have learnt some of the techniques used by search engines. How can you use that knowledge to help you find information you want?

Last modified: Thursday, 2 August 2012, 12:30 PM