Questions tagged [pypdf4]

pypdf4 is a fork of pypdf. pypdf4 had its only release in 2018, whereas pypdf is maintained again.

PyPDF4 is a fork of PyPDF2 which is a fork of pypdf.

PyPDF4 is dead. Please only use this tag if you are using PyPDF4. The pypdf project has its own tag.

In December 2022, PyPDF2 was deprecated in favor of pypdf

See also

12 questions

1

vote

1 answer

PyPDF4 Error [PdfReadWarning: Superfluous whitespace found in object header]

import PyPDF4 path = f'C:/Users/Gabriel/Desktop/Curso/Teste/pdfs/teste/ABRAHAO.pdf' pdf = open(path, 'rb') reader = PyPDF4.PdfFileReader(pdf, strict=False) page = reader.getPage(0) text = page.extractText() text = text.strip() reading a pdf file,…

python pdf pypdf4

asked Jan 24 '23 at 20:04

Espanholzx zx

11
1

1

vote

0 answers

Recursion with dictionary

I am using PyPDF4 to access to the "full" pdf-structures of a pdf and recursively store its values in a dictionary. The algorithm is supposed to work page-wise. A pdf object for common data structure, IndirectObject, needs to be cast to dictionary…

python python-3.x recursion pypdf pypdf4

asked Sep 10 '22 at 23:59

cards

3,936
1
7
25

0

votes

1 answer

Extract string from PDF that contains a URL

I have a PDF document with a few hyperlinks in it, and I need to extract the text/string from the pdf that contains a url. I have used the PyPDF2 and PyPDF4. I am able to extract the urls but unable to extract the string that contains the url. For…

python python-3.x pdf pypdf4

asked Jun 30 '23 at 12:43

Sandy

23
5

0

votes

1 answer

Pdf corrupting when adding watermark in amazon s3 lamdba function

I have s3 bucket which stores pdf's. We have to conditionally apply watermark on the pdf. We opted for s3 object lambda access point to achieve this. If we are saving the file back to s3 it is working fine but when returning dynamically in…

python python-3.x amazon-s3 aws-lambda pypdf4

asked May 30 '23 at 07:04

krishna

151
4
15

0

votes

2 answers

How to use Python to find the page number where a certain fonts is used in a pdf

How to use Python to find the page number where a certain fonts is used in a pdf. I tried in PYPDF2 library but not provided the expected output, For example Where Arial font is used, I want to print those page numbers. Here is the MME import…

python python-3.x pdf pypdf pypdf4

asked May 27 '23 at 09:13

TeX_learner

123
6

0

votes

0 answers

I am getting error of key must be PDF Object in PyPDF4

I am trying getting my error here from PyPDF4.generic import ByteStringObject pdf_reader = PyPDF4.PdfFileReader(pdf_file) page = pdf_reader.pages[0] page.mergePage(pdf_reader.pages[0]) content = page['/Contents'].getObject() content = re.sub(b"/Tx…

python excel pdfmerger pypdf4

asked Apr 02 '23 at 12:38

timp bill

57
7

0

votes

0 answers

Some PDF attachments do not open when added using PyPDF4. How to fix this?

I am using PyPDF4 to add file attachments to a base PDF. The final PDF file opens and has all attachment PDFs attached but some of them do not open when double-clicked. Could you advise on why this happens and what's the workaround for the same? The…

python pdf attachment pypdf4

asked Mar 18 '23 at 15:51

Raghu Varier

11
3

0

votes

0 answers

Password Protecting PDF files for Editing

I need help creating a code that takes PDF files in a specified folder, and adds a password for editing. Then saves the password protected PDF files into a specified folder. The goal is to make the PDFs locked for editing, but essentially turning…

pdf password-protection pypdf pypdf4

asked Mar 08 '23 at 00:07

ddub66

1

0

votes

1 answer

How to find the number of line a string appears using pyPDF?

I am using pyPDF4 to read a pdf File. The file has text like: Abrechnung30.11.2022 0,00+ Kontostand/Rechnungsabschlussam30.11.2022 672,06H Rechnungsnummer:2022-11-3020:53:31.468209 01.12.2022 01.12.2022 Barausz.Debit.KFK What I am trying to do is:…

python string pypdf4

asked Jan 29 '23 at 19:36

Kevin

1
1

0

votes

0 answers

reading text from PDF contains unknown encoding

I'm using PyPDF4 to read text from a PDF I downloaded. This works, but the text string is not readable: ÓŒŁ–Ł@`@äŽ–Ł@`@Ä›¥–Ž¢–@¥ŒŒŽ—–ﬁ–Ł Áﬁ⁄–ﬂ–Ł–@›ŁƒŒŽﬂ†£›– As far as I know the file is not encrypted, I can open it in Acrobat Reader without…

python pypdf4

asked Nov 16 '22 at 13:07

Spiffo

3
4

0

votes

0 answers

PDF Generator, merging 3 or more pdf files

I'm trying to extend the code i wrote for 2 files to make it work with an undetermined number of files. The code works but the pdf created is empty. I know the problem is in my list which is not a Pdf4 file writer object but I can't figure it…

python for-loop pypdf4

asked Oct 24 '22 at 11:45

Christopher Pring

31
4

0

votes

0 answers

PDF splitting with Bookmarks in python through PyPDF4 - bookmarks are losing in the output

I am trying to create a script to split the pdf pages for the given page numbers/labels from the pdf, the script are producing the split pdf correctly, but few information are losing, and need to be correct book mark is losing in the separated pdf,…

python pdf-generation pypdf4

asked Sep 15 '22 at 13:34

TeX_learner

123
6