2

I want to read files dynamically As i am giving pdf file from django the type of file it is having class 'django.core.files.uploadedfile.TemporaryUploadedFile' if i read the file it will have only bytes which is not understandable. Below is my code

def file(request):
    pdf = request.FILES['file']
    pdf1 = (pdf.read())

if i print the question i am getting bytes which is not understandable.so how to read pdfs from django dynamically and get text by using django

  • `response = HttpResponse(content_type='application/pdf'); response['Content-Disposition'] = 'attachment; filename="somefilename.pdf"'` – felipsmartins Dec 31 '19 at 18:57
  • I didn't want http response.My question is if i print pdf1 it will give bytes no text i want the text present in pdf – Mohd Abdul Raoof Dec 31 '19 at 18:59
  • You can use the following [question](https://stackoverflow.com/a/26495057/10224558) to solve your problem – Ziad Abouelfarah Dec 31 '19 at 19:00
  • @MohdAbdulRaoof **if i print pdf1 it will give bytes no text i want the text present** - ??? it makes no sense. Now, if you want *PARSE* the PDF file in order to extract data, that's another question. – felipsmartins Dec 31 '19 at 19:04
  • You are right but i am giving file from djangoapp it will come in pdf variable.After that line pdf1 is having bytes – Mohd Abdul Raoof Jan 01 '20 at 02:55

1 Answers1

1

if you use the build in function .open you can handle it like a file and then access the text as mentioned in the comments or e.g. by using PyPDF2 and looping over the pages

import PyPDF2
pdf = request.FILES['file']
pdfFileObj =pdf.open()
pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 
for p in range(pdfReader.numPages):
    pageObj = pdfReader.getPage(p) 
    pdftext = pdftext+pageObj.extractText()
print(pdftext)
horseshoe
  • 1,437
  • 14
  • 42
  • After searching everywhere. This worked for me. Thanks. I had to change it according to pypdf module. But rest was perfect. – Ronn Wilder Aug 21 '23 at 09:47