Questions tagged [pypdf4]

pypdf4 is a fork of pypdf. pypdf4 had its only release in 2018, whereas pypdf is maintained again.

PyPDF4 is a fork of PyPDF2 which is a fork of pypdf.

PyPDF4 is dead. Please only use this tag if you are using PyPDF4. The pypdf project has its own tag.

In December 2022, PyPDF2 was deprecated in favor of

See also

12 questions
1
vote
1 answer

PyPDF4 Error [PdfReadWarning: Superfluous whitespace found in object header]

import PyPDF4 path = f'C:/Users/Gabriel/Desktop/Curso/Teste/pdfs/teste/ABRAHAO.pdf' pdf = open(path, 'rb') reader = PyPDF4.PdfFileReader(pdf, strict=False) page = reader.getPage(0) text = page.extractText() text = text.strip() reading a pdf file,…
1
vote
0 answers

Recursion with dictionary

I am using PyPDF4 to access to the "full" pdf-structures of a pdf and recursively store its values in a dictionary. The algorithm is supposed to work page-wise. A pdf object for common data structure, IndirectObject, needs to be cast to dictionary…
cards
  • 3,936
  • 1
  • 7
  • 25
0
votes
1 answer

Extract string from PDF that contains a URL

I have a PDF document with a few hyperlinks in it, and I need to extract the text/string from the pdf that contains a url. I have used the PyPDF2 and PyPDF4. I am able to extract the urls but unable to extract the string that contains the url. For…
Sandy
  • 23
  • 5
0
votes
1 answer

Pdf corrupting when adding watermark in amazon s3 lamdba function

I have s3 bucket which stores pdf's. We have to conditionally apply watermark on the pdf. We opted for s3 object lambda access point to achieve this. If we are saving the file back to s3 it is working fine but when returning dynamically in…
krishna
  • 151
  • 4
  • 15
0
votes
2 answers

How to use Python to find the page number where a certain fonts is used in a pdf

How to use Python to find the page number where a certain fonts is used in a pdf. I tried in PYPDF2 library but not provided the expected output, For example Where Arial font is used, I want to print those page numbers. Here is the MME import…
TeX_learner
  • 123
  • 6
0
votes
0 answers

I am getting error of key must be PDF Object in PyPDF4

I am trying getting my error here from PyPDF4.generic import ByteStringObject pdf_reader = PyPDF4.PdfFileReader(pdf_file) page = pdf_reader.pages[0] page.mergePage(pdf_reader.pages[0]) content = page['/Contents'].getObject() content = re.sub(b"/Tx…
timp bill
  • 57
  • 7
0
votes
0 answers

Some PDF attachments do not open when added using PyPDF4. How to fix this?

I am using PyPDF4 to add file attachments to a base PDF. The final PDF file opens and has all attachment PDFs attached but some of them do not open when double-clicked. Could you advise on why this happens and what's the workaround for the same? The…
0
votes
0 answers

Password Protecting PDF files for Editing

I need help creating a code that takes PDF files in a specified folder, and adds a password for editing. Then saves the password protected PDF files into a specified folder. The goal is to make the PDFs locked for editing, but essentially turning…
ddub66
  • 1
0
votes
1 answer

How to find the number of line a string appears using pyPDF?

I am using pyPDF4 to read a pdf File. The file has text like: Abrechnung30.11.2022 0,00+ Kontostand/Rechnungsabschlussam30.11.2022 672,06H Rechnungsnummer:2022-11-3020:53:31.468209 01.12.2022 01.12.2022 Barausz.Debit.KFK What I am trying to do is:…
Kevin
  • 1
  • 1
0
votes
0 answers

reading text from PDF contains unknown encoding

I'm using PyPDF4 to read text from a PDF I downloaded. This works, but the text string is not readable: ÓŒŁ–Ł@`@䎖Ł@`@Ä›¥–Ž¢–@¥ŒŒŽ—–fi–Ł Áfi⁄–fl–Ł–@›ŁƒŒŽfl†£›– As far as I know the file is not encrypted, I can open it in Acrobat Reader without…
Spiffo
  • 3
  • 4
0
votes
0 answers

PDF Generator, merging 3 or more pdf files

I'm trying to extend the code i wrote for 2 files to make it work with an undetermined number of files. The code works but the pdf created is empty. I know the problem is in my list which is not a Pdf4 file writer object but I can't figure it…
0
votes
0 answers

PDF splitting with Bookmarks in python through PyPDF4 - bookmarks are losing in the output

I am trying to create a script to split the pdf pages for the given page numbers/labels from the pdf, the script are producing the split pdf correctly, but few information are losing, and need to be correct book mark is losing in the separated pdf,…
TeX_learner
  • 123
  • 6