How optical character recognition (OCR) works

How optical character recognition (OCR) works

Optical Character Recognition (OCR) refers to software that creates a digital version of a printed, typed, or handwritten document that computers can read without the need to manually type or enter text.

OCR is typically used on scanned PDF documents, but it can also create a machine-readable version of the text within an image file.

What is OCR

OCR, also known as text recognition, is a software technology that transforms characters such as numbers, letters and punctuation (also called glyphs) from printed or written documents into an electronic format that is more easily recognizable and readable by computers and other software programs. Some OCR programs do this when a document is scanned or photographed with a digital camera and others may apply this process to documents that have previously been scanned or photographed without OCR. OCR allows users to search PDF documents, edit text, and reformat documents.

What is OCR used for?

For quick, daily scanning needs, OCR may not be a big deal. If you do a large amount of scans, being able to search through PDFs to find the exact one you need can save you some time and make OCR functionality in the scanner program more important. Here are some other things OCR helps with:

  • Automated data processing and entry ( Example : candidate monitoring systems for curricula).
  • Make scanned books searchable.
  • Converting handwritten scans into machine readable text.
  • Making documents more usable by reading programs that assist visually impaired users.
  • Preserve historical documents and newspapers, also making them searchable.
  • Data extraction and transfer to accounting programs (Example: Receipts and invoices).
  • Indexing of documents for use by search engines.
  • Driver’s license plate recognition by speed camera software and red light camera.
  • Speech synthesizers for people who cannot speak: Theoretical physicist, Stephen Hawking, is perhaps the best known user of a speech synthesis program.

Why use OCR?

Why not just take a picture, right? Because you wouldn’t be able to edit anything or search for text because it would just be an image. Scanning the document and running the OCR software can turn that file into something you can edit and be able to search for.

OCR history

While the very first use of text recognition dates back to 1914, the development and widespread use of OCR-related technologies began in earnest in the 1950s, most notably with the creation of highly simplified characters that were easier to convert into digitally readable text. The first of these simplified characters was created by David Shepard and commonly known as OCR-7B. OCR-7B is still in use today in the financial industry for the standard font used on credit and debit cards. In the 1960s, postal services in several countries began using OCR technology to dramatically speed up mail sorting, including the United States, Great Britain, Canada and Germany. OCR is still the core technology used to sort mail for postal services around the world. In 2000, key knowledge of the limitations and capabilities of OCR technology was used to develop the CAPTCHA programs used to block bots and spammers.

Over the decades, OCR has become more accurate and sophisticated thanks to advances in related technology areas such as artificial intelligence, machine learning, and machine vision. Today, OCR software uses pattern recognition, feature detection and text extraction to transform documents faster and more accurately than ever.

FAQ
  • How can I scan documents with my phone or tablet?

    On iOS, open the Notes app and create a new note. Open the camera, then tap Scan documents . On Android, open Google Drive and select the Plus ( + ), then tap Scan to scan the document with your phone.

  • How do you use OCR in Adobe Acrobat?

    Open a PDF file containing a scanned image, then select Tools > Edit PDF . Acrobat will automatically apply OCR so you can edit the text. Just select where you want to make changes and start typing.

  • What is the difference between OCR and OMR?

    Optical Mark Recognition (OMR) is software that detects marks on paper, typically a bubble sheet. OMR is used to process the results of exams, polls, questionnaires and even elections. Unlike OCR, OMR cannot decipher the marks on the page, it only verifies that the marks are present.

Similar Posts

Leave a Reply

Your email address will not be published.