clean up the background for scanned document


Could anyone please teach me how to clean up the background of a scanned document, using PS?

Does PS have text recognition function that I can use to extract the text, because essentially the text is all I want?


enter image description here

10/15/2015 4:18:00 PM

That's a very bad scan! Even software that have OCR might not be able to decode this because of the lack of contrast.

It looks like you use a setting on your scanner to convert it to black and white only; maybe scan in colors or grayscale instead. If you have a gray background and black texts, it will be way easier to make one black and the other white. Right now, it's all pure black and white and it's very difficult to separate the texts from the background.

If this doesn't work, maybe try to photocopy your pages by adjusting the contrast on the photocopier and then scan these photocopies. Sometimes it helps to get rid of the background noise. It's an extra step but at least you might be able to get a clearer scanned image. That's a good trick if you don't want to spend too much time adjusting the levels in Photoshop, or at all. Photocopiers can be adjusted to act as the threshold or curve functionalities in photo editing software.

Then you can use a OCR feature in Adobe Acrobat Pro or any software that has it, but know that the text will still need some editing because it doesn't always give perfect results. OCR will transform your scanned texts as real editable texts.

In Photoshop, you can always adjust the contrast of the text and background by using the "adjustment levels" to make the background white and the text black (see trick here, similar technique: how to compensate 50% opacity white over photograph)

Similar tricks from Smithsonian Institution Archives:

4/13/2017 12:46:00 PM