Advertisement


How to change PDF text encoding ? (ANSI to UNICODE)


Question

I have this problem with a PDF I am trying to copy the text from... I have this text in a pdf and I need to insert in into a HTML page, the problem is that when I copy the text some of the letters(the one with diacritics(like: Ț or Ș) are being left out, the words containing them are not correct anymore...

I found out that this is because the PDF is using ASNI font encoding while the browser uses UNICODE ... how can I change the ANSI encoding in the PDF to transform it to UNICODE ?

2014/06/24
1
6
6/24/2014 12:46:00 PM

If the problem is indeed what you describe, Notepad++ should do what you want, it's free. Create a new document in Notepad++, make sure 'Encode in ANSI' is selected in the Encoding menu, paste the text there, then choose 'Convert to UTF-8 without BOM' in the Encoding menu.

You can also try using Decoder, a free online tool for fixing encoding problems. It's in Russian, but usage is pretty straightforward - paste mangled text into the text box and hit the button that says "Расшифровать".

2012/07/24
4
7/24/2012 5:49:00 AM