I want to read a pdf file in python. Tried some of the ways- PdfReader and pdfquery but not getting the result in string format. Want to have some of the content from that pdf file. is there any way to do that?
Asked
Active
Viewed 1,273 times
0
-
2You can see at: http://stackoverflow.com/questions/2481945/how-to-read-line-by-line-in-pdf-file-using-pypdf – twots Aug 20 '15 at 06:50
2 Answers
0
PDFminer is a tool for extracting information from PDF documents.

Community
- 1
- 1

Nishant Nawarkhede
- 8,234
- 12
- 59
- 81
-1
Does it matter in your case if file is pdf or not. If you just want to read your file as string, just open it as you would open a normal file.
E.g.-
with open('my_file.pdf') as file:
content = file.read()

hspandher
- 15,934
- 2
- 32
- 45
-
if the file is in pdf format, it will return some of junk characters. – nilay gupta Aug 20 '15 at 06:55
-
Do you wish to render file or just read its contents for some purpose?? – hspandher Aug 20 '15 at 06:57
-
@hspandher i just want to read the pdf n then want to save some of the content of that pdf in the string format. – nilay gupta Aug 20 '15 at 07:02
-
-
So what I'm saying is pdf is ofcourse encoded, do you want to 'read' the contents of pdf (by read I mean decipher), or you just want to save the contents to some other file or stuff. – hspandher Aug 20 '15 at 07:09
-
-
@hspandher the method u suggested is fine for txt file but when am doing same with pdf file it returns some junk chars .. – nilay gupta Aug 20 '15 at 07:45