0

i am trying to read diacritized_text from pdf file by using this code:

import PyPDF2 
import pdfplumber.utils
import pdfminer.pdftypes
import arabic_reshaper
from pdfplumber.pdf import PDF
from bidi.algorithm import get_display
from PyPDF2 import PdfFileReader, PdfFileWriter
#import pyPdf
import codecs
input_filepath = "D:\Arabic research\input.pdf" file path
output_filepath = "D:\Arabic research\output.txt"#output text file path
output_file = open(r"D:\Arabic research\output.txt", "wb")#open output file
pdf = PyPDF2.PdfFileReader(codecs.open(r"D:\Arabic research\input.pdf", "rb", encoding='utf-8'))#read PDF
for page in PyPDF2.pages:#loop through pages
    page_text = page.extractText()#get text from page
    page_text = page_text.decode(encoding='utf-8')#decode 
    print(page_text)
    output_file.write(page_text)#write to file
output_file.close()#close

but i have the following error:[Errno 22] Invalid argument

LOTUS
  • 1
  • 2
  • The error seems related to the file name, not the actual code. Could you please [edit] to show the full traceback? – tripleee Dec 16 '21 at 07:35
  • I notice that you lack the `r` prefix on the first two file names, so that's probably your bug. If that's not it, please revert and clarify. – tripleee Dec 16 '21 at 07:36
  • For legibility, probably put spaces on both sides of the `#` comment character. – tripleee Dec 16 '21 at 07:37

0 Answers0