Questions tagged [python-pdfreader]

Python API to parse PDF documents, extract texts (plain and formatted), images, XObjects, Forms and other data. Provides direct access to all object attributes and object history. Follows PDF 1.7 specification.

Python API to parse PDF documents, extract texts (plain and formatted), images, XObjects, Forms and other data.

Follows PDF 1.7 specification.

Provides direct access to all object attributes and object history.

See pdfreader - Tutorials and Examples

32 questions
2
votes
1 answer

How to view a pdf file generated in databricks

I tried generating a sample pdf file using the below code. I believe a pdf has been generated, but I can't view it. How can I view this pdf and how to export it. I am new to databricks. Please help to find a solution. Thanks from fpdf import…
Kiran
  • 21
  • 1
  • 2
2
votes
2 answers

Python does not print PDF with pyPDF2

I tried to print pages of a pdf document: import PyPDF2 FILE_PATH = 'my.pdf' with open(FILE_PATH, mode='rb') as f: reader = PyPDF2.PdfFileReader(f) page = reader.getPage(0) # I tried also other pages e.g 1,2,.. …
rob
  • 31
  • 6
2
votes
1 answer

How to extract some mathematical expressionfrom pdf using python?

I have a pdf which has math equations like this I am trying to extract the objective questions from a pdf file and convert them into csv file using python in such a way that each row of table contain a question, four options in each column and a…
1
vote
0 answers

Converting PDF Table from URL into a Pandas Dataframe?

Having issues converting PDF data into a dataframe depending on how the PDF is uploaded to the website. Hi all, Does anyone have any ideas on how to read an uploaded PDF's data into a pandas dataframe? I am having issues doing it with certain…
1
vote
1 answer

Decrypting a pdf file

So I am trying to decrypt the pdf file by using brute force approach. The "pdfReader.decrypt(password)" returns a ENUM for type PasswordType. I am not able to figure out how do I compare this enum to print the message that the file is decrypted…
1
vote
1 answer

Reading images from pdf and extract Text from it

Problem Statement: I have a pdf which contains n number of pages and each page has 1 image whose text I need to read and perform some operation. What I tried: I have to do this in python, and the only library I found with the best result is…
1
vote
1 answer

Randomly damaged pdf files when using requests.get() with Python to download pdf

Thank you for reading my post. I have a list of urls for pdf files. for eachurl in url_list: print(eachurl) Below are the links for my…
Jacob Ho
  • 11
  • 2
1
vote
1 answer

Convert .pdf to .docx on Adobe pdf services API (using Python)

I'm trying to write a Python program converting ".pdf" files to ".docx" ones, using Adobe PDF Server API (free trial). I've found literature enabling to transform any ".pdf" file to a ".zip" file containing ".txt" files (restoring text data) and…
1
vote
2 answers

PDF document: How to verify the digital signature using python?

We are doing the RPA project and extract the data PDF to excel using python. Now we need verify the digital_signature in PDF.
0
votes
1 answer

Better Layout Output for PDF Tables Extracted using Camelot

I'm building a python program using Camelot that extracts tables from a PDF (see code below). I am able to successfully execute the code, but I am hitting a road block on how to get a better output result. Specifically, I'm trying to get the code to…
0
votes
1 answer

Extract consecutive two pages from a pdf document and save each file with a text from each first page as the filenames

I have a 100 page pdf document. Each two pages contain unique employee data. I need a python code to extract each of the two pages and save them as separate files with filenames as the text extracted from each first page. For example The 100 page…
Normad68
  • 19
  • 4
0
votes
1 answer

Is there Python module I can use to correct words that have random spaces in?

I'm analysing a pdf and for some reason many of the words have random spaces in or none between after I move it to python. I'm using PdfReader from PyPDF2. Examples: Y ou’re sweet, but I feel fine. I wish I feltas calmas you look. The strange thing…
Rishi B
  • 1
  • 1
0
votes
1 answer

I am getting the following error in my code: "'_VirtualList' object is not callable"

This is the code: import os from openpyxl import Workbook from PyPDF2 import PdfReader input_folder = r"C:\Users\91620\OneDrive\Desktop\Final Year Project\case laws (2)\New folder (2)" output_file = r"C:\Users\91620\OneDrive\Desktop\Final Year…
0
votes
1 answer

expected str, bytes or os.PathLike object, not TextIOWrapper error

Hello i want to make a pdf reader but there's an error occures named "expected str, bytes or os.PathLike object, not TextIOWrapper" here is the codes import PyPDF2 import pyttsx3 from tkinter import * from tkinter.filedialog import askopenfile from…
0
votes
0 answers

Reading text PDFReader

Can anyone tell me when I run this code why its giving me back a link? The file is saved locally on my computer as a PDF. When I open the file it opens directly in Adobe Reader and there is not link.. This is a deed with names and legal…
mason
  • 1
  • 2
1
2 3