How to change PDF text encoding ? (ANSI to UNICODE)
I have this problem with a PDF I am trying to copy the text from... I have this text in a pdf and I need to insert in into a HTML page, the problem is that when I copy the text some of the letters(the one with diacritics(like: Èš or È˜) are being left out, the words containing them are not correct anymore...
I found out that this is because the PDF is using ASNI font encoding while the browser uses UNICODE ... how can I change the ANSI encoding in the PDF to transform it to UNICODE ?
If the problem is indeed what you describe, Notepad++ should do what you want, it's free. Create a new document in Notepad++, make sure 'Encode in ANSI' is selected in the Encoding menu, paste the text there, then choose 'Convert to UTF-8 without BOM' in the Encoding menu.
You can also try using Decoder, a free online tool for fixing encoding problems. It's in Russian, but usage is pretty straightforward - paste mangled text into the text box and hit the button that says "Ð Ð°ÑÑˆÐ¸Ñ„Ñ€Ð¾Ð²Ð°Ñ‚ÑŒ".