2 How to do it
2.3 Searching for information on the Web
What do you do if you don't know the URL of the website you are looking for, or haven't been able to browse to it? The Web is not like a library – it isn't carefully organised and catalogued, and it is growing all the time. Luckily, there are search sites that can help you find what you want.
Visit the Excite home page. Spend no more than a few minutes getting a sense of what information is available from this page.
Some websites are set up as web portals: they aim to provide you with a one-stop-shop for everything on the Web. They provide their own editorial material, news headlines, weather and other up-to-date information, as well as links to commercial partners and paid advertisements. They also provide a search facility – did you spot the one on Excite?
Internet service providers (ISPs) usually provide their own portal and may configure your browser to use it as your home page. (Note that ‘home page’ here means the default page your browser displays when first started.)
2.3.2 Search sites
Other sites such as Google and Yahoo! concentrate on providing search facilities.
Yahoo! started out as a list of useful websites put together by two Stanford University students, but has grown somewhat since then. It still offers a web directory: a huge list of useful web pages collected by Yahoo! staff that you can browse. The directory is organised in the same way as a classified phone directory, but the difference is that categories can be browsed in successively greater detail.
Visit Yahoo! Find the Web Site Directory (see image above) and follow links to a topic that interests you.
For example, I followed these links: Science > Animals > Mammals > Primates > Apes > Gorillas to reach this page:
Web directories can be a useful starting point if you are looking for information in a general area. If you are looking for more specific information or want to look more widely, a full-text search engine provides an alternative.
A full-text search is when you search the full text of the original source, rather than the keywords associated with it.
Search engines attempt to search all the text on all the pages of the Web. They use software spiders to seek out and index web pages, storing the results in huge databases. We will see how this is done later in this unit.
Programs that crawl over the web, fetching web pages by following links. Spiders are used by search engines to find pages for indexing.
Visit Google. Search for a topic that interests you.
For example, I searched for ‘gorilla’, with the following results:
Search sites often provide both directories and full-text search, and will combine results to offer you the best of both worlds.
2.3.3 Search results
Let us look at the results returned by a search engine. I've chosen to use Google, but you may use another search engine; the layout is likely to be different in detail but most of the same elements will be present.
Visit Google or another search site and search for a topic that interests you.
The page of results may include hits from several different sources. Google, for example, may include some results from current news stories. Search sites will often include prominent results that are paid for by advertisers.
Documents that meet your search criteria.
Look at some results. Can you distinguish those that come from advertisers?
On Google some results are marked as ‘sponsored links’ – businesses and organisations pay Google a fee so that their pages are associated with particular keywords.
Each hit returned provides several pieces of information to help you decide whether to visit the page. This may include the title of the page, a short extract with search terms highlighted, and the domain (which will give you clues to the publisher of the page). For pages that also appear in the search site's directory there may be a short description and a link to the category in which the page appears.
A search engine may return a huge number of hits, but surprisingly often the information you wanted can be found by following one of the first few links. This is because the results are ranked to offer you the ‘best’ first.