I have a Node.js app that requires the use of pdftotext
from poppler-utils
to do some PDF parsing when a file is uploaded and stored at a remote location. The command being run is:
pdftotext -layout https://example.com/myfile.pdf -
to get the text output to stdout
so that I can use the result in my application.
This works fine when running the application directly on my local machine but when ran inside of a docker container node:18-alpine
, I receive the error:
Internal Error: Cannot handle URI 'MY_URL'
I believe this is because it's in a container because I can execute the command just fine on my local machine with pdftotext v23.03.0
and in a container with the same version I get an error. Same error happens with different versions on the node
container and different OS base images.
Using curl
to download the file to some temporary location on the container then using pdftotext
from the file works just fine, however I seem to have troubles creating an files on my Azure App Service instance when it's deployed.
Any help is appreciated as well as any pointers to different ways of doing this. I have not found any other PDF parsing Node library that can parse the file while preserving the layout like pdftotext
does.