1

I'm dealing with high volume data. I want to generate a PDF report which contains 1 year data. I'm using html-pdf to do that. The problem is that it is taking around 20 minutes to generate the PDF. Can I decrease it?

I don't want to make my user wait for 20 minutes.

Note:I want to generate PDF on server side only and I want to mail that to user.

I tried for some solutions,but I didn't fond any. But what I observed is that when I'm downloading some file from google, the workflow is very smooth.

I want to know how can I make my workflow smooth

 app.post('/api', (req, res) => {
    var json = req.body;
    var sqlGet = 'SELECT StationId,TimeStamp,data->>"$.temperature" as temperature,data->"$.humidity" as humidity,data->"$.so2" as so2,data->"$.no2" as no2,data->"$.co" as co,data->"$.o3" as o3,data->"$.co2" as co2,data->"$.pm10" as pm10,data->"$.noise" as noise,data->"$.rain" as rain,data->"$.wdir" as wdir,data->"$.wspeed" as wspeed,data->"$.press" as press FROM TwentyStationFiveYearData WHERE StationId = ' + json.stationid + ' AND TimeStamp >= ' + json.start + ' AND TimeStamp <= ' + json.end;
    console.log('' + sqlGet);
    connection.query(sqlGet, function(err, result) {
        if (err) {
            console.log('error' + err);
        } else {
            var htmlCode = generateTable(result, 15);
            //var output = fs.createWriteStream('reports/' + json.start + '-' + json.end + '.pdf');
            pdf.create(htmlCode, {
                orientation: 'landscape',
                type: 'pdf',
                timeout: '1800000'
            }).toStream(function(err, stream) {
                if (err) {
                    console.log('stream did not work ' + err);
                } else {
                    stream.pipe(fs.createWriteStream('output.pdf'));
                    console.log('pdf generated');
                    res.send("report will be mailed");
                }
            });
        }
    });
});

function generateTable(jsonData, columns) {
    var tab = '';
    var headers = ['StationId', 'TimeStamp', 'temperature', 'humidity', 'so2', 'no2', 'co', 'o3', 'co2', 'pm10', 'noise', 'rain', 'wdir', 'wspeed', 'press'];
    tab += '<tr>';

    for (var c = 0; c < columns; c++) {
        tab += '<th>';
        tab += headers[c];
        tab += '</th>';
    }
    tab += '<tr>';
    jsonData.forEach(obj => {
        tab += '<tr>';
        for (var c = 0; c < columns; c++) {
            tab += '<td>';
            tab += obj[headers[c]];
            tab += '</td>';
        }
        tab += '</tr>';
    });
    tab = '<table border = "1">' + tab + '</table>';
    return html(tab);
}

function html(table) {
    var htmlCode = '<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><title>Title</title></head><body>' + table + '</body></html>';
    return htmlCode;

}

I want to make my workflow smoother and to decrease the time taken for PDF report generation

O. Jones
  • 103,626
  • 17
  • 118
  • 172
Sanjay
  • 81
  • 11
  • You have a composite index on (StationId,TimeStamp)? – Strawberry Jun 08 '19 at 07:52
  • yes,i have done indexing on stationid,timestamp,stationid and timestamp..these are the 3 types of index that i have – Sanjay Jun 08 '19 at 07:56
  • You should have one, composite index on station_id,timestamp. You could also have a separate index on timestamp. An index on station_id alone is redundant – Strawberry Jun 08 '19 at 08:01
  • right now,that's not the problem.Iam querying using both stationid and timestamp parameters which means its gonna use the stationid and timestamp index. – Sanjay Jun 08 '19 at 08:07
  • I want to know any suggestions to decrease the pdf generation time – Sanjay Jun 08 '19 at 08:07
  • yes.I know that the performance issue is present in the html-pdf conversion.But i am not able to get what to do to improve the performance.That is why i posted the complete details so that someone can help me.what iam expecting is that,suggestions of some libraries which have greater performance than html-pdf or any other suggestions of how to generate pdfs in a more efficient way – Sanjay Jun 08 '19 at 11:36
  • It's best on SO to be very specific in your question. You originally tagged your question with `mysql` so the active `mysql` contributors focused, incorrectly, on that part of your question. – O. Jones Jun 08 '19 at 14:32
  • `html-pdf` uses `phantom-js` to generate the pdf. Have you tried other render methods such as headless chrome used by `puppeteer`. Based on the data and layout other methods might yield different performance. – Easwar Jun 08 '19 at 14:48

3 Answers3

0

Have you narrowed down the time-consuming operations? is it the pdf rendering the most time-consuming?

What I would do is check the raw query to see how much time it takes to be executed.

Secondly, I would check the size of the result and if it can be compressed somehow and finally, I would check the PDF rendering.

I have used dompdf for PHP and there was a performance issue when I was rendering png instead of jpg but I suppose this is not the case as I don't see any image rendering.

So if flexible to use another library you better do it or else check for performance tuning in the one you use.

I suppose the font rendering can take a lot of time.

Hope I gave you some hints!

Konstantinos A
  • 126
  • 2
  • 6
0

npm's html-pdf uses phantomjs . When you use it, it makes a displayless web browser instance in a subprocess, then uses it to render the html data you give it, then uses the same instance to print it to a pdf file. That works. But as you have discovered, it's slow when you give it a large html object.

Still, 20 minutes is a long time. Is it possible your machine is RAM-constrained? Is the phantomjs subprocess running out of RAM and thrashing?

Maybe pdf-puppeteer is faster. It's worth a try, and it's more future-proof than relying on the now-deprecated phantomjs.

You might try PDFKit. It generates pdf files directly from within your node program. That will be faster because it needs no subprocess. But, you must rework your report program so it uses PDFKit's API rather than a browser instance.

There are multiple software-as-a-service solutions available that work like html-pdf but faster. You might consider using one of those, especially if quick delivery of pdf files is critical to your business.

Finally, even though you don't want to do it, you can write your HTML and CSS so it does a really good job formatting for printing. And ask your user to print.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • I need tables in the output pdf which is not present in PDFKit.so i cant use that.I will give pdf-puppeteer a try.. – Sanjay Jun 10 '19 at 07:45
0

pdf.create(htmlCode, { orientation: 'landscape', type: 'pdf', timeout: '1800000'

You have set the timeout as 1800000 milliseconds, so its is taking long time to generate pdf. So

var options = { format: "A3", orientation: "landscape", border: "10mm", timeout: "3000" };

var document = {
    type: 'buffer',
    template: html,
    context: {
        Wellsfargo: wellsfargo
    },
};


if (document === null) {
    return null;

} else {
    pdf.create(document, options).then(response => {
        res.writeHead(200, {
            "Content-Disposition": "attachment;filename=" + "wellsFargo.pdf",
            'Content-Type': 'application/pdf'
        });
        return res.end(response);
    })
    .catch(error => {
        console.error(error);
        return res.status(200).type('application/pdf').send(error.toString());
    });
};

});

try these timeout or remove the timeout..

because phanthomjs default timeout is 3000milliseconds.

use dynamic-html-pdf package it is very fast..

https://www.npmjs.com/package/dynamic-html-pdf

Aravind A
  • 23
  • 1
  • 7
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 16 '22 at 14:50