3

I want to change encoding from T1 to UTF-8 in Java. I read and parse pdf to String using Tika. Then I change the ecoding using decode and encode methods in Charset class. There is no problem with common encodings like windows-1252 or UTF-8, but I cannot find class with T1 encoding.

Text sample:

Przykªady zastosowa«

I can properly decode this text on this page: http://kanjidict.stc.cx/recode.php

bartektartanus
  • 15,284
  • 6
  • 74
  • 102
  • 1
    Seeing that the [T1 (Cork) encoding](http://en.wikipedia.org/wiki/Cork_encoding) is not that difficult, you could build your own class. – Uyghur Lives Matter Oct 02 '14 at 18:26
  • 1
    I think that creating your own Cork Charset and a [CharsetProvider](http://docs.oracle.com/javase/8/docs/api/java/nio/charset/spi/CharsetProvider.html) and [loading it manually](http://stackoverflow.com/questions/6308587/loading-a-java-charset-manually) would be the best way to go. – David Conrad Oct 02 '14 at 18:46

0 Answers0