Extract Text from Images or Scanned PDF -

Who I am
Pau Monfort
@paumonfort
Author and references

Images (in jpeg, jpg, bmp, gif, png, etc ...) and scanned PDFs have one thing in common: they do not allow you to select, copy and extract text from them. Therefore, if you have a scanned document or an image containing very important text that you need to modify or copy, the only possible solution is to make use of a programma OCR.

An OCR program is a tool that has an internal optical character recognition technology, a very useful technology for recognizing and extract text from images or scanned PDF. One of the best programs in this area is definitely PDF Element, which we have already seen at work in the guide on how to extract text from PDF document.



PDFElement is compatible with both Windows and Mac computers and is available in a “professional” version that includes OCR technology, useful for extracting text from scanned images or documents. Let's see below how it works and how simple it is to extract text from images.

How to Extract Text from Images or Scanned PDF

Step 1. Download and install PDFElement on your computer

Here are the links from which you can download the completely free demo version:

 

After installing and starting the program you will see the following splash screen:

Step 2. Import the scanned image or PDF

Click on the bottom left OPEN FILE ... and select the scanned image file or pdf. For our tests and for this article we have specially created a JPEG image (via “Paint”) and put some text in it. Once that image was loaded into the program, here's what appeared:



Step 3. Perform OCR function

As you can see from the figure above, the program automatically detects that it is an image and asks you if you want to perform the OCR in order to recognize the text in the image. By clicking on RUN OCR you will first have to select the language of the text and then start the scan. During the scan, this pop-up will appear warning you to wait until the procedure is complete:


Step 4. Text extraction

After the OCR magically all the text contained in the image (or in the scanned PDF) will be "editabile". That is, you can copy it, modify it, delete it, highlight it, etc ...


At this point you can save everything either in PDF format or in Word, Excel, Powerpoint (from section HOME just click on the icon of the desired output format).

 

a free online tool? I can't download anything

  • Try this: https://pdftotext.com/
    But I don't know if it supports scanned PDFs ...

  • you were very clear, I will try the program, later I will report the outcome. Thanks

  • I have yet to try it I'll tell you

  • Extract Text from Images or Scanned PDF -

    Audio Video Extract Text from Images or Scanned PDF -
    add a comment of Extract Text from Images or Scanned PDF -
    Comment sent successfully! We will review it in the next few hours.