4 What's going on

4.7 Soundex

You will have seen that most genealogical websites and software can match names despite differences in spelling. For example, FamilySearch provides a check box to choose matches using exact spelling; if this box is not checked, results include many variant spellings. How is this done?

Most genealogy software uses the soundex method; this has been used to index the US census back to 1900. Soundex converts names that sound similar into the same code and the search is then performed on the coded name.

Soundex

A method of coding surnames so that names that sound similar but are spelled differently are given the same code. For example, Brown, Browne and Braun are all coded as B650.

Activity 32

You can try out a Soundex calculator by clicking here. Enter your surname in the Calculate Your Soundex Code, click on Calculate Soundex and take a note of the result.

The soundex code consists of a letter and three numbers. The letter simply preserves the initial letter of the name. The numbers are obtained from the remaining letters according to rules: letters that sound similar (such as S and F, P and V, or M and N) are converted to a single number, vowels and some other letters are ignored, repeated letters are ignored, and the code truncated or padded to be four characters long.

For example, Rosewell is encoded as R240, where the 2 represents the S and the 4 represents the L; all other letters apart from the initial R are ignored and a 0 added to pad the code to four characters. The same code represents other similar names such as Rosewall, Roswell, Rowswell and even Russell.

The US National Archives and Records Administration provides a detailed description of how soundex works.

Activity 33

Can you think of limitations to the soundex scheme?

Now read the discussion

Comment

The scheme was developed for English surnames and so is based on the sounds of spoken English. It also works reasonably for related languages such as French and German but less well for dissimilar languages. Even for English names it does not always perform as you might hope; for example, check the coding for Thompson and Thomson using the soundex calculator above.

Last modified: Thursday, 2 August 2012, 12:30 PM