Finding your Ancestors in The Newspapers

Different search options

Newspapers hold so much information on our ancestors and could potentially hold the key to help unlock mysteries about who they were and what they did. As more and more newspapers are digitized and made available online, the more we can learn.

Search options scan the computer generated text, a process called optical character recognition or OCR.

“Optical character recognition (OCR) is a fully automated process that converts the visual image of numbers and letters into computer-readable numbers and letters. Computer software can then search the OCR-generated text for words, phrases, numbers, or other characters. However, OCR is not 100 percent accurate, and, particularly if the original item has extraneous markings on the page, unusual text styles, or very small fonts, the searchable text OCR generates will contain errors that cannot be corrected by automated means. Although errors in the process are unavoidable, OCR is still a powerful tool for making text-based items accessible to searching.”

(Library of Congress)

Newer newspapers, I have noticed, have a cleaner text and thus the OCR is more accurate. Many older newspapers were first microfilmed many years ago before being digitized. This can pose a problem in the legibility of the content. Issues on the microfilm such as dust and scratches, and ink blots or inconsistencies in the newsprint paper can lead to strange outcomes. The computers pick up on all the little flecks and ink blots and generate text for these as well. I have found ‘~’s in the middle of words, * and ^ in various places, among other things. ‘e’ often is found as ‘o’ or ‘c’, and likewise the other way around for each character.

When searching for your ancestors names, try different variations of how the letters might be perceived by a computer. For example the name Smith might be generated as Srnitln or Smltln. Don’t rule out those special characters either. ‘S’ might be found as ‘$’, ‘D’ and ‘B’ can be interchanged, and ‘ri’ for an ‘n’ occurs often.

Look at the font of the newspaper you are searching and compare it to the OCR. Make some guesses as to how a computer might “see” the font and how it might be construed. Try these new terms out in the search field and see if any come up with a match.

There are times when you know the date your ancestor died, and there is newspaper coverage for that date, but nothing comes up in the searches. When this occurs, you will have to read the paper to find it on your own. I once found an obituary this way, the OCR was a series of lines and symbols, as the digitized microfilm was washed out and the text was hardly legible. The computer program had no chance figuring that one out.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with WordPress.com
Get started
%d bloggers like this: