1

I'm using pdfbox pdfbox-2.0.22 but also tried in pdfbox-3.0.0-RC1. So I have this strange behavior.

With this code:

documentoIn = PDDocument.load(file);
documentoOut = new PDDocument();
for (PDPage page : documentoIn.getPages() )
{
    documentoOut.addPage(page);
}

In most cases it works ok, but with pdfs with one page with dimensions 209,9x296,7mm the resulting page has size +-Letter (215,9x279,4mm) and it's cropped. And this happens to several pdfs with this size.

With other sizes and, even more fun, with a pdf with the 209,9x296,7 page duplicated the resulting pages are ok.

Didn't check where the pdfs came from, but then can came all from the same scanning machine which should produce A4 pages, but misses by a few milimiters.

Any thoughts?

Thanks, Cláudio

mkl
  • 90,588
  • 15
  • 125
  • 265
  • By simply adding pages from one document to another one you may lose attributes that page inherited in the other document. Try importing the page instead: `documentoOut.importPage(page)`. But have a look at the JavaDocs of that method, some additional actions may be required. – mkl May 19 '21 at 09:43
  • Hi mkl, Thanks for the feedback. Just tried importPage and the page still changes size. Damn thing :P (other things came out wrong, but I guess it's normal). Anyway, will check the JavaDocs for any clue. – Cláudio Tereso May 19 '21 at 09:51
  • If possible, please share the PDF in question for analysis. – mkl May 19 '21 at 09:57
  • 1
    Hi mkl, Yes, I'll do that. I was thinking that maybe the pdf is malformed and when I use software to duplicate the pages the software fixes the problem. If it's malformed and pdfbox can't get the size it gives the default size: letter. But that's strange because I tried reading the size from the page and it was Ok. As far as I can check the one I'm trying comes from sage software. It's an invoice, so I can't share it now. I'll try to get one that I can share and has the problem. – Cláudio Tereso May 19 '21 at 10:04
  • Damn, Can't find a pdf without sensible data and if I try to edit the pdf the resultant pdf doesn't give the error. There's no doubt, the pdf is malformed, but, as far as I can see they come from diferent sources. I'll keep on trying to find a pdf that I can share here or I can share privately if you are willing to give me a way to do it. :) – Cláudio Tereso May 19 '21 at 11:25
  • Have a look at both files with PDFDebugger, look for MediaBox and CropBox. – Tilman Hausherr May 19 '21 at 11:39
  • So, I've tried a, problematic, one page file and the same file after being saved by pdfill. The original resizes the pdfill one doesn't. PDFDebugger gives this info for the files: https://www.dropbox.com/s/x4y8hi3bpvv6b2h/pdf%20metadata.png problem: mediabox is out of place. solution: ? :) – Cláudio Tereso May 19 '21 at 13:43
  • ok. this is my understading. having mediabox in the pages root means all pages inherit it, but when I had the page to another pdf that info is lost, so I must check if the page has a mediabox before copying it to the new pdf, and if it doesn't, copy the mediabox from the source pdf to the page or else it will inherit the letter size on the destination pdf. Sounds ok? – Cláudio Tereso May 19 '21 at 13:58
  • finally found this: https://stackoverflow.com/questions/37526904/page-is-cropped-in-new-document-in-pdfbox-while-copying old question it seams :) Thanks mkl and Tilman for your help. – Cláudio Tereso May 19 '21 at 14:22
  • 1
    importPage does that, so it's weird that it didn't work?! – Tilman Hausherr May 19 '21 at 15:08
  • There was more code involved and just changing to importPage added other problems that I didn't get to try to solve at the time. If i can't solve it easily other way, will check what else I needed to do to have ImportPage working. – Cláudio Tereso May 19 '21 at 15:41
  • PS: Going back to the beggining of the thread just realized that mkl said the same I just got to find by myself :P (I have a bad memory) Anyway, thanks you both. With your help I learned a lot about pdfs and fill more confident to solve the problem (as soon as I have time). Thanks! – Cláudio Tereso May 19 '21 at 15:52

0 Answers0