I am trying to write a block of code that does this: it first extracts text from a pdf and then creates a text file with the content in it. This is what I wrote:
import os
import pyPdf
import re
##function that extracts text from pdf
def pdfcontent(filename):
ct = ""
pdf = pyPdf.PdfFileReader(file(filename,"rb"))
for i in range(0,pdf.getNumPages()):
ct += pdf.getPage(i).extractText() + "\n"
return ct
##funcion that generates a txt file from a pdf
def pdftotxt(filename):
##first, convert pdf to txt
pdfct = pdfcontent(filename)
##fix filename problem
newfn = re.sub(".pdf", "", filename)
#now generate txt
fo = open(r'C:\Users\xxx\PycharmProjects\untitled\decisiontxt\' + newfn + ".txt","wb")
fo.write(pdfct)
fo.close()
pdftotxt("PDFfromDocumentum.pdf")
EDIT: I fixed my previous problems and then another problem came up:
File "C:/Users/xxx/PycharmProjects/untitled/fdsa", line 22
fo = open(r'C:\Users\xxx\PycharmProjects\untitled\decisiontxt\' + newfn + ".txt","wb")
^
SyntaxError: EOL while scanning string literal
It seems to me that Python took
fo = open(r'C:\Users\xxx\PycharmProjects\untitled\decisiontxt\' + newfn + ".txt","wb")
as a string instead of a command. What's the solution to this problem?