-1

My problem describe this image http://185.49.12.119/~pogdan/7spacedot/7spacedot.jpg input file http://185.49.12.119/~pogdan/7spacedot/monitor_2016_99.pdf

output file http://185.49.12.119/~pogdan/7spacedot/monitor_2016_99.txt

all set files with jar and java http://185.49.12.119/~pogdan/7spacedot/

Why itextpdf insert space? how remove it? Replace 7 . -> 7. not solved for me.

mkl
  • 90,588
  • 15
  • 125
  • 265
pogdan
  • 11
  • 2

1 Answers1

0

Why itextpdf insert space?

iText inserts spaces whenever there is a gap between two consecutive text chunks which is larger than a certain amount, or if two consecutive text chunks overlap. It does so to signal that the chunks do not follow each other in a normal way.

In case of your document a dot following a seven often is moved left as far as possible so that the character bounding boxes overlap:

Sample overlapping 7 and .

how remove it?

If you don't want this, you have to adjust the text extraction strategy you use accordingly.

In the current 5.5.9 the code looks like this:

if (result.charAt(result.length()-1) != ' ' && renderInfo.getText().length() > 0 && renderInfo.getText().charAt(0) != ' '){ // we only insert a blank space if the trailing character of the previous string wasn't a space, and the leading character of the current string isn't a space
    float spacing = lastEnd.subtract(start).length();
    if (spacing > renderInfo.getSingleSpaceWidth()/2f){
        appendTextChunk(" ");
        //System.out.println("Inserting implied space before '" + renderInfo.getText() + "'");
    }
}

The source of your ancient iText version might still look similar here. And this is where you have to change the logic to not insert spaces for backsteps or at least only for larger ones.


As the OP explained in a comment, using

float spaceWidth = renderInfo.getSingleSpaceWidth() * 3f/2f;
float diffI1 = start.subtract(lastEnd).get(Vector.I1);
if (spacing > spaceWidth && diffI1 > 0)
{
    result.append(" ");
}

works well in his case. This does not mean, though, that one should generally change the strategy code this way as it assumes writing oriented in the direction of the positive x axis. Furthermore, the optimal value of the constant by which renderInfo.getSingleSpaceWidth() is multiplied, also depends on thedocument type at hand, cf. e.g. this case.

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265
  • Thanks for the explanation. I'm trying to do it this way float diffI1= lastEnd.subtract(start).get(Vector.I1); if (diffI1>0) { appendTextChunk();} but it does not. – pogdan May 26 '16 at 08:29
  • What happens instead? – mkl May 26 '16 at 13:02
  • the result is the same as before http://185.49.12.119/~pogdan/7spacedot/7spacedot1.jpg // 127670. LELY EAST -> LEL Y EAST :( float spacing = lastEnd.subtract(start).length(); float spaceWidth= renderInfo.getSingleSpaceWidth()/2f; float diffI1= lastEnd.subtract(start).get(Vector.I1); if (spacing > spaceWidth && diffI1>0){ result.append(" "); } – pogdan May 26 '16 at 18:39
  • bug, should be diff1<0. This code with 3/2 ratio float spaceWidth= renderInfo.getSingleSpaceWidth()*3f/2f; float diffI1= start.subtract(lastEnd).get(Vector.I1); if (spacing > spaceWidth && diffI1>0){ result.append(" "); } working quite well. Thanks for Your answer! – pogdan May 26 '16 at 20:37
  • @pogdan *working quite well* - great. In that case please consider marking the answer as accepted; you can do so by clicking the tick at the upper left of it. – mkl May 30 '16 at 10:11