Reducing Java heap space consumption while PDF creation

Question

I've got a problem with iText.

I'm creating PDFs with lots of images, so the Java heap space runs out very easy.

Tried to analyze the dmp with Eclipse Memory Analyzer and found out, that every image uses about 10MB of heap space. But they have only about 350KB on the HD

Is there the chance to flush the heap to the HD and go on with the creation?

Are there other common leaks?

Unfortunately I found nothing useful yet.

Heap

That's what the heap looks like for one image

In general I think that added elements remain in cache... how can I get them out?

Is something like this possible?

That's the code as I use it at the time:

Document document = new Document();
PdfWriter writer = null;
        try {
            writer = PdfWriter.getInstance(document, new FileOutputStream(this.savePath));
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (DocumentException e) {
            e.printStackTrace();
        }

document.open();

Paragraph pdfTitle = new Paragraph();
pdfTitle.add(new Phrase("Title"));

try {
    document.add(pdfTitle);
    document.add(Chunk.NEWLINE);

} catch (DocumentException e) {
    e.printStackTrace();
}

for(int x = 0; x < 10; x++){
    //chapter
    Paragraph chapterName = new Paragraph("Chapter "+x, FONT[1]);
    ChapterAutoNumber chapter = new ChapterAutoNumber(chapterName);

    try {
    document.add(chapterhapter);
    } catch (DocumentException e) {
        e.printStackTrace();
    }

    for(int y = 0; y < 10; y++){
        //sec
        Paragraph sectionName = new Paragraph("Section "+y, FONT[2]);

        Section section  = chapter.addSection(sectionName);

        for(int z = 0; z < 10; z++){
            //subSec
            Section subSection = null;

            Image image = null;
            try {
            image = Image.getInstance(path);
            } catch (BadElementException e) {
                e.printStackTrace();
            } catch (MalformedURLException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }

            image.scalePercent(50);

            image.setCompressionLevel(9);
            Paragraph subDesc = new Paragraph("Desc "+z, FONT[3]);

            subSection = section.addSection(subDesc);

            picSection.add(image);

            try {
                document.add(subSection);
            } catch (DocumentException e) {
                e.printStackTrace();
            }

        }

    }

}

document.close();

Your only option may be to use more memory. A machine with 32 GB doesn't cost that much these days. ;) 32 GB of PC memory costs around $250. — Peter Lawrey, Sep 06 '12 at 10:34
If I'm not able to create a 40 pages PDF with 200MB of cache, I won't try to use iText to generate 1000 pages PDFs in parallel... — Franz Ebner, Sep 10 '12 at 09:57
How many cores does each PDF generation use on average? How many cores do you have? Say the answers are 1.5 and 6, this means the optimal number of PDFs to generate concurrently is about 4 (6/1.5). — Peter Lawrey, Sep 10 '12 at 09:59
from my point of view which is maybe young and inexperienced, I admit, renting a EC2 farm is not the way to go for simple pdf creation.. I wanna learn how to bike not drive a Ferrari... — Franz Ebner, Sep 10 '12 at 10:05
Not sure that an EC2 farm is worth getting either. You can get a server with 2x hex core CPUs and 48 GB of memory for $3500 or $90/month on lease. — Peter Lawrey, Sep 10 '12 at 10:14
Have you checked [this post](http://stackoverflow.com/questions/5261422/pdf-compression-java)? — assylias, Sep 10 '12 at 11:23
@assylias I don't want to reduce filesize, screenshots aren't that consuming... — Franz Ebner, Sep 10 '12 at 11:41

score 2 · Accepted Answer · answered Sep 10 '12 at 11:45

2

I'm the original developer of iText, and I've downvoted your question because your code is all wrong.

For instance: you create a chapter object, but you're not adding it to the document ever. Instead you're adding a picSection object that isn't defined anywhere.

My main criticism however, is the fact that you're using the ChapterAutoNumber object, which implements the LargeElement interface, and complain about memory use. That's like saying: every day I eat a jar of mayonaise, I wonder: how come I'm so fat?

Why are you using Chapter/Section? If bookmarks are the main reason for choosing these objects, you should switch to using PdfOutline if you want to reduce the memory used. Because now, you're building up a huge pile of objects by adding them to the Chapter object, and these objects can only be released at the moment you add the chapter to the document. Before that moment, it's no use to do garbage collecting because the garbage collector can't throw away the content stored in the Chapter object.

If you are addicted to using the Chapter class, take a look at the setComplete() method, and add small portions of the chapter to the document on a regular basis, so that objects can be released little by little. The first approach (not using the Chapter class) is far better than this second one.

I may decide to remove the Chapter/Section classes from iText if I see more questions like this.

answered Sep 10 '12 at 11:45

Bruno Lowagie

75,994
9
109
165

8

As a newbie I was wondering if making programming mistakes is a good reason for downvotes. Shouldn't downvotes refer to the way questions are posed? – Andrea Casaccia Sep 10 '12 at 11:53
1

Firstly I'm very pleased about your answer. I'm sorry for this little bug, made an example and renamed the whole thing... I tried to reuse this [example](http://itextpdf.com/examples/iia.php?id=48). Is there an example how not to use this "jar of mayonaise"? thx – Franz Ebner Sep 10 '12 at 11:57
1

As for the downvote: good questions don't contain code that can't be compiled, unless the problem is that they don't compile. – Bruno Lowagie Sep 10 '12 at 12:55
seemed logical to me, that every bookmark has an appended content in the file, that's a whole new philosophy now... does the content has to be inserted as paragraphs? and "Paragraph + PdfOutline == Chapter"? – Franz Ebner Sep 10 '12 at 13:07
Not sure if I understand the question. Instead of adding content to Chapters and Sections, you just add the content straight to the document (that way, they can be released as soon as a page is full). Whenever you want a bookmark to appear, you also create a PdfOutline object, and you add it to the appropriate parent: either the root outline or a PdfOutline object you've created yourself. – Bruno Lowagie Sep 10 '12 at 13:25
Thank you, going to award you the bounty in some hrs, maybe some more free descriptions of your examples would reduce your effort (what I really appreciate) to answer questions like this – Franz Ebner Sep 10 '12 at 13:30

score 1 · Answer 2 · edited May 23 '17 at 11:55

1

There are some useful answers here: Java heap space out of memory

Try setting the images to null after appending them to PDF?

edited May 23 '17 at 11:55

Community

1
1

answered Sep 06 '12 at 10:39

anon

11
1

score 1 · Answer 3 · answered Sep 07 '12 at 06:36

1

My first guess would be to search iText documentation for some kind of streaming support. Is there some way not to store the whole PDF being created in memory? RTFM.

Second option is certainly to increase a heap size for your application, if you have an appropriated hardware for that.

And, just in case, I should mention a memory leak possibility. Although in your case it does seem unlikely, there is Plumbr if you need it :)

answered Sep 07 '12 at 06:36

Nikem

5,716
3
32
59

I created a 40 pages pdf with 40 images.. and needed about 600MB.. I want to do that in parallel with even more pages, so that's no option. The "FM" costs about 40 bugs – Franz Ebner Sep 07 '12 at 07:14
Why using more memory is not an option? E.g. using Amazon EC2 machine with large enough memory? – Nikem Sep 07 '12 at 08:03
Of course it's an option but I'm sure it's better an possible to reduce the consumption instead – Franz Ebner Sep 07 '12 at 08:26
Btw, isn't 10MB somewhat too many for a picture? I imagine, if you insert a picture into PDF document, then it's resolution is somewhat limited. Assuming that PDF is of A4 format, 10MB makes almost 1000 dpi. Overkill? May be you can reduce the size of your images? – Nikem Sep 07 '12 at 10:42
the image files have about 100kb (Screenshots)... don't know where this amount is from... – Franz Ebner Sep 07 '12 at 10:58

score 1 · Answer 4 · edited May 23 '17 at 12:26

The reason the 100kB image takes up so much memory is probably because it is compressed on disk and uncompressed (raw) in memory.

Java should garbage collect and be able to work with this unless you use too many at the same time and you get out of memory.

This question goes into cleaning up the used memory; Java Heap Overflow, Forcing Garbage Collection

Sometimes re-using objects mitigates memory usage. Can you re-use the last PDF/Image object and load the next one into it. Meaning instead of creating a new object?

Reducing Java heap space consumption while PDF creation

4 Answers4