Highest Voted 'pdfparser' Questions

5

votes

0 answers

Read pdf page one at a time - Pdf.js

I am trying to parse a pdf with more than 300 page. I am using pdf-parse npm package. The pdf has 300 pages. But my application crashes to while parsing the pdf. My question is that is there way by which i can parse one page at a time? Below is the…

javascript pdf.js pdfparser

asked Jan 18 '20 at 10:37

user10090131

4

votes

3 answers

Read specific value based on label name from PDF in C#

I have an asp.net Core 2.0 C# application which read/parse the PDF file and get the text. In this I want to read specific value which have specific label name. You can see the below image I want to get the value 171857 which is Invoice number and…

c# pdf itext pdfparser

asked May 16 '19 at 07:08

prog1011

3,425
3
30
57

3

votes

2 answers

Why do pdf parsing libraries pdf2json and pdf-parse seem to not work with Next JS app router?

I've been trying to implement pdf parsing logic in my Next JS app. It seems the libraries pdf2json and pdf-parse don't work with the new Next JS app router. Steps to reproduce: Run npx create-next-app@latest and follow the prompts, and say Yes to…

next.js nextjs13 pdfparser pdf2json

asked Jun 07 '23 at 14:06

Andrew Luo

31
1

3

votes

2 answers

Arabic pdf text extraction

I'm trying to extract text from Arabic pdfs - raw data extraction not OCR -. I tried many packages, tools and none of them worked, python packages, pdfBox, adobe API, and many other tools and all of them field to extract the text correctly, either…

pdf text-extraction pdf-parsing pdfparser pdftextstream

asked Jun 09 '22 at 11:45

B.A

45
4

1

vote

0 answers

Extracting specific data via coordinates using php pdfParser

I want to extract specific data from various pdfs that are 3-4 pages each. I don't want to parse everything (all the text of each page) and then using for example regular expressions in order to match the data that i want. So i was looking the…

php parsing text-parsing pdf-parsing pdfparser

asked Apr 10 '23 at 09:56

ThunderBoy

391
1
3
18

1

vote

1 answer

Issue using Apache tika parser when trying to parse pdf having text contains image

I am using these two dependencies:- tika core 2.6.0 tika parser standard package 2.6.0 .Parsing is working fine for these cases:- pdf file with text. pdf file with images. text files and other extensions. Parsing is failing with pdfparser runtime…

java scala apache-tika runtimeexception pdfparser

asked Nov 11 '22 at 21:04

DeadPool

40
8

1

vote

0 answers

I have a error when i use parseFIle function with pdfparser

I wan't to parse a file with : https://github.com/smalot/pdfparser The problem When i use $parser->parseFile($pathToPdf) I got this : Argument 1 passed to Smalot\PdfParser\Parser::parseHeader() must be of the type array, string given, called in…

php pdf pdfparser

asked Oct 25 '22 at 14:17

LocDog

36
4

1

vote

1 answer

How to decode PDF file and encode it back?

My overall goal is to make some PDF files conform to the PDF/A standard for archival purposes. They fail one requirement, namely that some glyph mappings map to 0, which they should not. My usual strategy was to use an old software called "Pdfedit"…

pdf adobe qpdf pdfparser

asked Sep 01 '20 at 14:34

Smogshaik

180
2
13

1

vote

0 answers

Getting the page size from uploaded PDF metadata file in my PHP code

Here I used a PDF parser PHP library: parseFile('ss.pdf'); // Retrieve all details…

php pdf metadata pdfparser

asked Jan 14 '20 at 13:50

Steven Ragy

11
2

1

vote

1 answer

Parsing PDF and getting the header portion information

Am trying to parse the contents of PDFs. Basically they are scientific research papers. Here's the portion am trying to grab: I only need the paper title and the author name(s). What I used is the PDF Parser Library. And I was able to get the…

php parsing pdf pdfparser

asked Jul 11 '19 at 11:13

Akhilesh B Chandran

6,523
7
28
55

1

vote

0 answers

getting same junk when extracting hindi / devnagri text from pdf by pdftotext or pdfparser

I am using php Pdfparser and pdftotext to extract hindi/ devnagri text from pdf. But I am getting the same kind of junk or garbage using both of the above mentioned. Junk, for example : f{kfrt114; rhanz feJ dk tUe lu~ 1977 esa v;ksè;k (mÙkj…

php pdf pdftotext pdfparser

asked Apr 18 '19 at 05:47

KJA

85
5

1

vote

1 answer

pdfparser from pdfminer: PDFException: PDFDocument is not initialized

I'm not understanding this error. I want to open a pdf and loop over the pages but I'm getting this exception and I couldn't find much by googling it. Here is the example that fails from pdfminer.pdfparser import PDFParser, PDFDocument from os.path…

python-3.x exception pdfminer pdfparser

asked Feb 08 '19 at 17:00

Atirag

1,660
7
32
60

1

vote

0 answers

Getting empty combo box value from PDF file in express js

I'm getting empty combo box value from PDF file using 'pdf2json' parser in express.js. The value on PDF file showing the different option inside the combo box and it also storing state of the selection while saving the file, but when I try to parse…

javascript express pdfparser

asked Mar 14 '18 at 13:12

jasmeetsohal

141
1
2
11

1

vote

0 answers

TCPDF_PARSER ERROR: Invalid object reference: Array

I'm using library PDFparser (https://github.com/smalot/pdfparser) to convert PDF file to text. When I try to convert a file on a local web-server, it parses OK. When I try to convert a file on remote web-server, it fails with the following error:…

php parsing pdf pdfparser

asked Feb 04 '18 at 03:02

Александр Чи

129
8

0

votes

1 answer

Read pdf-content in next.js 13 api route-handler results in 404

I have followed this tutorial (https://www.youtube.com/watch?v=enfZAaTRTKU) on youtube which teaches one how to upload a pdf-file a to an express server and read out its content. I do not want to display the pdf - I only care about the text. I have…

pdf next.js pdfparser

asked May 27 '23 at 08:02

frankBang

117
1
11

Questions tagged [pdfparser]