3

I've been trying to implement pdf parsing logic in my Next JS app. It seems the libraries pdf2json and pdf-parse don't work with the new Next JS app router.

Steps to reproduce:

  1. Run npx create-next-app@latest and follow the prompts, and say Yes to using app router.
  2. Add an API route under app/api/test
import { NextResponse } from "next/server";
import fs from "fs";
import PDFParser from "pdf2json";
import pdf from "pdf-parse";

export async function GET() {
   const pdfParser = new PDFParser();

   pdfParser.on("pdfParser_dataError", (errData: any) =>
     console.error(errData.parserError)
   );
   pdfParser.on("pdfParser_dataReady", (pdfData: any) => {
     console.log(pdfData);
   });

   pdfParser.loadPDF("./sample.pdf");
  return NextResponse.json({});
}
  1. Add a sample.pdf file in the root dir
  2. Run from terminal curl localhost:3000/api/test, pdf2json throws an uncaught error:
- error node_modules/pdf2json/lib/pdf.js (66:0) @ eval
- error Error [ReferenceError]: nodeUtil is not defined
  1. Trying pdf-parse returns a 404 not found for the API route
import { NextResponse } from "next/server";
import fs from "fs";
import PDFParser from "pdf2json";
import pdf from "pdf-parse";

export async function GET() {
   let dataBuffer = fs.readFileSync("./sample.pdf");

  pdf(dataBuffer).then(function (data) {
    // number of pages
    console.log(data.numpages);
    // number of rendered pages
    console.log(data.numrender);
    // PDF info
    console.log(data.info);
    // PDF metadata
    console.log(data.metadata);
    // PDF.js version
    // check https://mozilla.github.io/pdf.js/getting_started/
    console.log(data.version);
    // PDF text
    console.log(data.text);
  });
  return NextResponse.json({});
}

After creating a separate project with the old pages router in Next JS, none of the above issues occurred and it was able to parse the PDF properly.

Anything I am missing here?

Andrew Luo
  • 31
  • 1

2 Answers2

2

you need to add a folder test/data/05-versions-space.pdf

I know this is extremely random but if you look into the code you will see that it needs this file - can be any pdf - the path and name have to be the same.

Filestructure

frankBang
  • 117
  • 1
  • 11
1

You need to update next.config.js file.

/** @type {import('next').NextConfig} */
const nextConfig = {
  experimental: {
    serverComponentsExternalPackages: ["pdf-parse"],
  },
};

module.exports = nextConfig;
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 05 '23 at 23:55