I'm trying to parse a document using Apache Tika that unfortunately replaces some character sequences - "ti", "fb" for example - with the an unknown Unicode symbol. I don't see a way to manage this using Tika itself, as the replacement character seems to be coming from PDFBox.
I also noticed that the character sequences in question are not part of the GlyphList. Would it be possible to add the sequences and a mapping to the GlyphList to get the expected output? I'm using Tika 1.21 with PDFBox 2.0.15.