2 How to do it
2.2 Browsing for information on the Web
One way to find what you are looking for on the Web is to start from sites that you know are likely to have useful ‘links’ on them, like the main Open University pages or the Open University Library pages. These opening pages are known as home pages and are a bit like the contents page of a book. The home page usually gives you some information about the content of the website, often with links to other pages of information held on that site and on sites elsewhere. By clicking on a link – which might be a heading, underlined or different coloured words, images, or a drop-down menu – you are actually sending a request to the computer that holds that information, asking it to send it to your screen. Following the links that are presented to you on the screen is called ‘browsing’ – which you have been doing already.
A home page is either the first page in a website or the page that your browser loads when it first starts.
A link is a cross-reference from one document to another, particularly between web pages. Links on websites are often shown as coloured and underlined text; clicking on the link loads the new page. The terms 'link' and 'hyperlink' are now used synonymously.
You use what is on the screen to guide you to what you want.
Another quick way of accessing web pages is by using the URL (Uniform Resource Locator) of the page you want to look at. Type the URL (also sometimes called the web address) into the location/address box of your browser, press enter, and the website is presented on screen.
2.2.1 Making URLs work for you
Each page on the Web has a unique address, rather like a postcode or telephone number. ‘Locator’ is the important word in ‘Uniform Resource Locator’ since it can give you clues as to where you are within a website (for example, are you on the home page or are you further in?), what you are looking at, and the source of the information you are viewing (for example, whether it is from an academic institution or a company). Not only that, but if you understand how a URL is structured you can use the principles to deduce what a company or organisation's web address might be.
All website names are part of the Domain Name System (DNS), and usually look something like this:
DNS, or Domain Name System
A system which translates between domain names (such as www.open.ac.uk) and numeric IP addresses (such as 220.127.116.11).
We can break down this long address and examine the individual parts to find out what we are looking at.
- http:// tells us that we are looking at a website – http stands for hypertext transfer protocol, the ‘protocol’ or set of rules used by the computer to access and deliver web pages.
- www.bbc tells us that we are looking at a website held on a computer (also called a ‘web server') known as ‘www’ belonging to an organisation called ‘bbc’. A web server computer is often called ‘www’, but occasionally something more specific; for example, the T180 website is on a server called ‘students’.
- .co.uk tells us that we are looking at the website of a company (‘co’) in the UK. This part of the address is called the ‘domain’. Examples of other domains you may come across include .edu or .ac (educational or academic); .com (commercial); .gov (government); .org (non-governmental, non-profit making organisations). These might be followed by a country code, such as .uk, .au (for Australia), or .fr (for France) that can indicate the location of the computer holding the website.
Once you get past the home page of an organisation or institution an address might take the form:
We can break down the second part of the address:
The pathname refers to the folder in which the document or file is to be found. In this instance the document name is ‘eligible.htm’, and it is stored within a folder called ‘childbenefit’ on a computer belonging to an organisation called ‘inlandrevenue’ in the UK government domain. The pathname may have several parts to it if the file is in a ‘nested folder’, as shown below.
Look at this URL. What information can you deduce from it?
This document is called ‘index.shtml’ and is contained within a folder called ‘gm_science’, which itself is contained within other folders called ‘gm_genie’, ‘genes’ and 'science’, on a computer belonging to the BBC, which is a company in the UK.
2.2.2 More on URLs
It is often possible to make an intelligent guess about a URL if you know the name of a company or organisation, but you may find some surprises. Some tips:
- domain names can't contain spaces or punctuation, so try running words together
- UK companies are usually in the .co.uk domain, but may also use .com
- organisations may use .co or .com rather than .org
- companies and organisations often also own variants on their domain name and will automatically redirect you to the correct website.
Try to deduce the URLs for the following:
- the Guardian newspaper
- The Times newspaper
- Marks & Spencer
- the Guardian: http://www.guardian.co.uk/ (and not www.guardian.com)
- The Times: http://www.timesonline.co.uk/ (and not www.times.co.uk, which is a totally unrelated site whose page counter claims more than 12 million visitors – perhaps many of them were looking for The Times)
- Tesco: http://www.tesco.com/ (but www.tesco.co.uk automatically redirects visitors to the correct URL)
- Marks & Spencer: http://www.marksandspencer.com/ (and not www.marks&spencer.com)