-2

I am trying to write a block of code that does this: it first extracts text from a pdf and then creates a text file with the content in it. This is what I wrote:

import os
import pyPdf
import re

##function that extracts text from pdf
def pdfcontent(filename):
    ct = ""
    pdf = pyPdf.PdfFileReader(file(filename,"rb"))
    for i in range(0,pdf.getNumPages()):
        ct += pdf.getPage(i).extractText() + "\n"
    return ct

##funcion that generates a txt file from a pdf
def pdftotxt(filename):
    ##first, convert pdf to txt
    pdfct = pdfcontent(filename)
    ##fix filename problem
    newfn = re.sub(".pdf", "", filename)
    #now generate txt
    fo = open(r'C:\Users\xxx\PycharmProjects\untitled\decisiontxt\' + newfn + ".txt","wb")
    fo.write(pdfct)
    fo.close()

pdftotxt("PDFfromDocumentum.pdf")

EDIT: I fixed my previous problems and then another problem came up:

File "C:/Users/xxx/PycharmProjects/untitled/fdsa", line 22
fo = open(r'C:\Users\xxx\PycharmProjects\untitled\decisiontxt\' + newfn + ".txt","wb")
                                                                                      ^
SyntaxError: EOL while scanning string literal

It seems to me that Python took

fo = open(r'C:\Users\xxx\PycharmProjects\untitled\decisiontxt\' + newfn + ".txt","wb")

as a string instead of a command. What's the solution to this problem?

RomanHotsiy
  • 4,978
  • 1
  • 25
  • 36
  • 1
    Which file/directory doesn't exist? Are you sure it's not the filename you feed to `PdfFileReader`? Please post the actual traceback. – wflynny Jul 15 '14 at 19:32
  • 1
    Note that a newline is denoted by `"\n"`, not `"/n"`. – jwodder Jul 15 '14 at 19:33
  • 2
    Seems you are able to solve your problems within not very long time frame. That's very good, and good luck, but people on the internet are probably not interested in a live report of your programming struggle. Please consider posting a question when you are really stuck (and unable to find the solution on SO), instead of editing it every few minutes with your latest achievements... – BartoszKP Jul 15 '14 at 19:40
  • Duplicate of http://stackoverflow.com/questions/2870730/python-raw-strings-and-trailing-backslash – BartoszKP Jul 15 '14 at 19:46

2 Answers2

0

If you want your script to create a new file if it does not exist use "wb" as the mode.

Refer to this for more information on using file modes.

EDIT ( Based on your edit )

The reason why you are getting EOL while parsing is that you are escaping the closing aphostrophe \' . Use backslash to escape the backslash preceding the apostrophe. I.E \\'

Raghav RV
  • 3,938
  • 2
  • 22
  • 27
0

Despite you're using raw string you should escape last \

open(r'C:\Users\xxx\PycharmProjects\untitled\decisiontxt\\' + newfn + ".txt","wb")

see Python raw strings and trailing backslash for details

RomanHotsiy
  • 4,978
  • 1
  • 25
  • 36