22

I found some repos, which do not look as they are still maintained:

I tried the approach with libreoffice, but the pdf output is so bad, that it is not useable (text on diff. pages etc.).

If possible I would like to avoid starting any background processes and/or saving the file on the server. Best would be solution where I can use buffers. For privacy reasons, I cannot use any external service.

doc buffer -> pdf buffer

Question:

How to convert docs to pdf in nodejs?

Community
  • 1
  • 1
Andi Giga
  • 3,744
  • 9
  • 38
  • 68

6 Answers6

9

For those who might stumble on this question nowadays:

There is cool tool called Gotenberg — Docker-powered stateless API for converting HTML, Markdown and Office documents to PDF. It supports converting DOCs via unoconv.

And I am happen to be an author of JS/TS client for Gotenberg — gotenberg-js-client

I welcome you to use it :)

UPD:
Gotenberg has new website now — https://gotenberg.dev

yumaa
  • 965
  • 9
  • 18
  • Is this doable using serverless? I would love to use it, but I'm using Firebase Cloud Functions and from what I read I need to have Gotenberg on separate instance so the second question is if it's scalable. Thanks! – jean d'arme Feb 20 '21 at 23:17
  • @jeand'arme Gotenberg container should be run somewhere, as far as I know there is no any public instance, unfortunately. You should run it within your infrastructure. Regarding scalability, Gotenberg is stateless, so you can scale it as much as you want. There is section about scalability in Gotenberg documentation: https://thecodingmachine.github.io/gotenberg/#scalability – yumaa Feb 21 '21 at 09:41
  • @yumma Thanks for the link! I now run into different problem - how to deploy it on Google Cloud Run. I even made a question here: https://stackoverflow.com/questions/66316490/how-to-pull-docker-hub-image-to-google-cloud-run Would be grateful if you have any solutions on that – jean d'arme Feb 22 '21 at 13:20
  • @yumma It works. On average it takes between 10-25 seconds to convert simple docx into pdf (tried multiple version and seem that 4k of RAM and 2 CPU's work the best - sometime under 7 seconds). Thanks for sharing this lib! – jean d'arme Feb 23 '21 at 21:15
4

While I was creating an application I need to convert the doc or docx file uploaded by a user into a pdf file for further analysis. I used npm package libreoffice-convert for this purpose. libreoffice-convert requires libreoffice to be installed on your Linux machine. Here is a sample code that I have used. This code is written in javascript for nodejs based application.

const libre = require('libreoffice-convert');
const path = require('path');
const fs = require('fs').promises;
let lib_convert = promisify(libre.convert)

async function convert(name="myresume.docx") {
  try {
    let arr = name.split('.')
    const enterPath = path.join(__dirname, `/public/Resume/${name}`);
    const outputPath = path.join(__dirname, `/public/Resume/${arr[0]}.pdf`);
    // Read file
    let data = await fs.readFile(enterPath)
    let done = await lib_convert(data, '.pdf', undefined)
    await fs.writeFile(outputPath, done)
    return { success: true, fileName: arr[0] };
  } catch (err) {
    console.log(err)
    return { success: false }
  }
}

You will get a very good quality of pdf.

shubham singh
  • 511
  • 1
  • 5
  • 16
  • 1
    This should be marked as an accepted answer. The only detail missing is : const { promisify } = require('bluebird'); – Andreas May 25 '20 at 17:32
  • Is this doable in serverless functions like Google Cloud Functions? Looks really good, would love to use it – jean d'arme Feb 20 '21 at 23:32
  • If google cloud function works similar to aws lambda, then yes. We need to zip the libreoffice-convert and add upload it to our function so that we can use it. – shubham singh Aug 18 '21 at 04:21
1

To convert a document into PDF we can use Universal Office Converter (unoconv) command line utility.

It can be installed on your OS by any package manager e.g. To install it on ubuntu using apt-get

sudo apt-get install unoconv

As per documentation of unoconv

If you installed unoconv by hand, make sure you have the required LibreOffice or OpenOffice packages installed

Following example demonstrate how to invoke unoconv utility

unoconv -f pdf sample_document.py

It generates PDF document that contains content of sample_document.py

If you want to use a nodeJS program then you can invoke the command through child process

Find code below that demonstrates how to use child process for using the unoconv for creating PDF

const util = require('util');
const exec = util.promisify(require('child_process').exec);

async function createPDFExample() {
  const { stdout, stderr } = await exec('unoconv -f pdf sample.js');
  console.log('stdout:', stdout);
  console.log('stderr:', stderr);
}

createPDFExample();
GauravLuthra
  • 1,027
  • 9
  • 8
  • I followed this path, on on Amazon Linux it gave me really hard time to setup unoconv with its dependencies as Yum doesn't have all the packages and manual installation also wasted alot of my time. – omair azam Apr 27 '19 at 18:24
1

Posting a slightly modified version for excel, based upon the answer provided by @shubham singh. I tried it and it worked perfectly.

    const fs = require('fs').promises;
    const path = require('path');
    const { promisify } = require('bluebird');
    const libre = require('libreoffice-convert');
    const libreConvert = promisify(libre.convert);

        // get current working directory
        let workDir = path.dirname(process.mainModule.filename)
        // read excel file
        let data = await fs.readFile(
          `${workDir}/my_excel.xlsx`
        );
        // create pdf file from excel
        let pdfFile = await libreConvert(data, '.pdf', undefined);
        // write new pdf file to directory
        await fs.writeFile(
          `${workDir}/my_pdf.pdf`,
          pdfFile
        );
Andreas
  • 416
  • 1
  • 4
  • 8
  • This works but this is not concurrent and can convert only one file at a time. What if there are multiple users and hitting the API at the same time ? I have implemented this using Node + Express but this is a fallback - No Concurrent conversion. – Rohit Daftari Jan 20 '22 at 09:26
0

Docx to pdf A library that converts docx file to pdf.

Installation:

npm install docx-pdf --save

Usage

 var docxConverter = require('docx-pdf');

   docxConverter('./input.docx','./output.pdf',function(err,result){
   if(err){
      console.log(err);
     }
    console.log('result'+result);
 });

its basically docxConverter(inputPath,outPath,function(err,result){
  if(err){
   console.log(err);
  }
   console.log('result'+result);
 });

Output should be output.pdf which will be produced on the output path your provided

0
const { spawn } = require('child_process');

const soffice = spawn('soffice', ['--convert-to', 'pdf', inputFilePath, '--headless']);
Tyler2P
  • 2,324
  • 26
  • 22
  • 31
  • first LibreOffice must installed in your system, then set environment path for 'soffice' command.after complete these steps you can simply run above lines. – nadgeSachin Apr 03 '23 at 21:09
  • 1
    Please don't post only code as answer, but also provide an explanation what your code does and how it solves the problem of the question. Answers with an explanation are usually more helpful and of better quality, and are more likely to attract upvotes. – Mark Rotteveel Apr 08 '23 at 10:36