PDF viewers are not rendering all of the tamil letters as expected.
Below is the actual content rendering in PDF viewer
From my understanding, these are the three cases requiring the substitution or change for Tamil letters.
Reverse the glyphs,
கெ = க + ெ = க ெ -> ெ + க = கெ
Split and reorder the glyphs
கொ = க + ொ = க ொ -> க + ெ + ா -> ெ + க + ா = கொ
Substitute new glyphe for a series of glyphes. The new glyphe do not have unicode, only exist in the font file.
கு = க + ு = க ு -> கு
Input text | Char list from JDK | Code points from JDK | gid in ttf | Actual* | Expected | |
---|---|---|---|---|---|---|
கெ | க + ெ | 2965 3014 Character : க Codepoint : 2965 unicode : ub95 Character : ெ Codepoint : 3014 unicode : ubc6 | 1828 1856 | க + ெ = க ெ | ெ + க = கெ | Reversing the glyphes expected. |
கொ | க + ொ | 2965 3018 Character : க Codepoint : 2965 unicode : ub95 Character : ொ Codepoint : 3018 unicode : ubca | 1828 1859 | க + ொ = க ொ | க + ெ + ா ெ + க + ா = கொ | Split and reorder expected. |
கு | க + ு | 2965 3009 Character : க Codepoint : 2965 unicode : ub95 Character : ு Codepoint : 3009 unicode : ubc1 | 1828 1854 | க + ு = க ு | கு (gid = 6698) | New glyphe expected. The new glyphe do not have unicode, only exist in the font file. |
How to handle these substitutions in an efficient way?
Looking at the GlyphSubstitutionTable, fontbox.cmap.Identity-H, fontbox.unicode.Scripts.txt. Couldn’t get it so far. Any help would be appreciated.
Links, Font Actual Expected Use cases PDFBox Jira