0

I am using the PDFBox to extract the character coordinates from the read PDF. However, I can't identify the unit of measurement of the value returned by the getXDirAdj () and getYDirAdj () methods?

@Override
protected void processTextPosition(TextPosition text) {
        String tChar = text.getCharacter();
        System.out.println("String[" + text.getXDirAdj() + ","
                + text.getYDirAdj() + " fs=" + text.getFontSize() + " xscale="
                + text.getXScale() + " height=" + text.getHeightDir() + " space="
                + text.getWidthOfSpace() + " width="
                + text.getWidthDirAdj() + "]" + text.getCharacter());
}
user196572
  • 17
  • 3
  • Related: https://stackoverflow.com/a/50335516/1729265 and in particular https://stackoverflow.com/a/57114889/1729265 concerning "text direction adjusted coordinates" – mkl Jul 08 '20 at 06:29
  • 1 unit = 1/72 inch – Tilman Hausherr Jul 08 '20 at 07:52
  • Thanks for the answer. It was very useful. Another question arose, how to obtain the rotation of the character read? (example of the posted medium). – user196572 Jul 08 '20 at 23:43

1 Answers1

0
  1. 1 unit = 1/72 inch

  2. "how to obtain the rotation of the character read": from the ExtractText.java tool:

    static int getAngle(TextPosition text)
    {
        Matrix m = text.getTextMatrix().clone();
        m.concatenate(text.getFont().getFontMatrix());
        return (int) Math.round(Math.toDegrees(Math.atan2(m.getShearY(), m.getScaleY())));
    }
    
mkl
  • 90,588
  • 15
  • 125
  • 265
Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97