1

I'm having some trouble writing a script in Python 2.7 on Windows. In part of the script, I need to compose a windows file path from a directory and filename with extension. It works fine when I write it in as a string, but I get an error when I try to do it as a concatenate. I think it might have something to do with spaces in the paths.

Here is a code section that works

filepath = os.path.normpath("C:/Users/jpettit/documents/projects/vendor files script/test files/122484.pdf")

print find_filename(filepath)

And here is the code section that doesn't work

directory_path = os.path.normpath("C:/Users/jpettit/documents/projects/vendor files script/test files")

file = "122484.pdf"

filepath = os.path.join(directory_path, file)
print find_filename(filepath)

I'm having a really hard time seeing what the difference between these two would be. Here's the code in context of the entire script.

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from cStringIO import StringIO
import re
import os


def convert_pdf_to_txt(path):
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    fp = file(path, 'rb')
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0
    caching = True
    pagenos=set()
    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching, check_extractable=True):
        interpreter.process_page(page)
    fp.close()
    device.close()
    str = retstr.getvalue()
    retstr.close()
    return str

def find_filename(filepath):
    try:
        filenumberlocation = re.search('\d\d\d\d\d\d\.pdf',filepath, re.IGNORECASE)
        filenumber = filenumberlocation.group()[:6]
        print filepath
        pdfconverted = convert_pdf_to_txt(filepath)
        revlocation = re.search('REV #\n....',pdfconverted)
        rev = revlocation.group()[-4:]
        new_filename = filenumber + ' ' + rev + '.pdf'
        return new_filename
    except AttributeError:
        return os.path.basename(filepath)

def list_files(directory_path):
    filenames_list = []
    for dirpath, dirnames, filenames in os.walk(directory_path):
        filenames_list.extend(filenames)    
    return filenames_list

directory_path = os.path.normpath("C:/Users/jpettit/documents/projects/vendor files script/test files")

file_list = list_files(directory_path)

for file in file_list:
    filepath = os.path.join(directory_path, file)
    os.rename(filepath, os.path.join(directory_path, find_filename(file)))

The error that I get says the following

Traceback (most recent call last):
    File "revfind.txt", line 59, in <module>
        os.rename(filepath, os.path.join(directory_path, find_filename(file)))
    File "revfind.txt", line 34, in find_filename
        pdfconverted = convert_pdf_to_txt(filepath)
    File "revfind.txt", line 16, in convert_pdf_to_txt
        fp = file(path, 'rb')
TypeError: 'str' object is not callable

As you can probably tell, I'm very new at this, and would really appreciate any guidance!

smci
  • 32,567
  • 20
  • 113
  • 146
JDP
  • 25
  • 5
  • 7
    Wow... never seen someone name a *Python* file with a `.txt` extension before :) – Jon Clements Feb 27 '15 at 08:43
  • 2
    Well don't assign `file` to a string if you want to use it to open a file later on. Or better use `open()` for opening files. – Ashwini Chaudhary Feb 27 '15 at 08:53
  • @AshwiniChaudhary is right, you assign `file` as a global variable in `for file in file_list:` statement. After that all calls of `file()` will try to make a call that string variable, not built in function. You may access build `file` via `__builtins__.file`, but it is very dirty. – myaut Feb 27 '15 at 08:59
  • Don't shadow the the builtin [**file object**](https://docs.python.org/2/library/stdtypes.html#file-objects) ! Rename your variable from `file`. Don't ever call your list `list`, don't call your string `string`, don't call your dict `dict`, don't call your set `set`, don't call your operator `operator`... you can call your function `function` (in Python), but you still shouldn't. – smci Feb 27 '15 at 09:32
  • ...don't call your object `object`, and don't call your type `type`, to flog the horse comprehensively. Please read questions like http://stackoverflow.com/questions/14595922/list-of-python-keywords – smci Feb 27 '15 at 09:41
  • In honor of this I asked ["Weirdest obfuscated Python code which intentionally shadows builtins, to bizarre effect?"](http://codegolf.stackexchange.com/questions/47156/weirdest-obfuscated-python-code-which-intentionally-shadows-builtins-to-bizarre) – smci Feb 27 '15 at 10:38

3 Answers3

0

The error message says that you are shadowing the file builtin - that's the effect of this line:

file = "122484.pdf"

From that point on, file is a string rather than the builtin function. The general advice for this problem is "don't do that" - ie, pick another name (calling it filename instead might be a good choice). However, in this case, the better advice would be to use the open function here, since file is a deprecated alias for it (which has been removed in Python 3). So, when you do this:

fp = file(path, 'rb')

do this instead:

fp = open(path, 'rb')

This particular use of a name shadowing a builtin happens to be safe and is probably unsurprising, but you should be careful of it in general.

lvc
  • 34,233
  • 10
  • 73
  • 98
0

You're shadowing the builtin file object

Don't do that. Call your variable f, myfile, thefile or whatever.

Don't shadow names of builtins. Don't ever call your list list, don't call your string string, don't call your dict dict, don't call your set set, don't call your operator operator... you can call your function function (in Python), but you still shouldn't. ...don't call your object object, and don't call your type type.

smci
  • 32,567
  • 20
  • 113
  • 146
-1

The difference has to be in the string passed to find_filename, and it is different between the two versions.

With your first version, that works, the slashes in the directory are all forward slashes, including the one before the filename:

C:/Users/jpettit/documents/projects/vendor files script/test files/122484.pdf

On Windows, the os.path.join without a trailing slash will append a backslash before the filename, resulting in:

"C:/Users/jpettit/documents/projects/vendor files script/test files\122484.pdf

If you add a forward slash to the end of the directory name, the os.path.join will produce the same result as your example that worked.

Edward
  • 44
  • 2
  • Possibly true but not the problem here. `file` is being assigned to a string and overriding the builtin function – Holloway Feb 27 '15 at 09:18
  • Windows correctly handles mixed slashes - `C:/Users/jpettit/documents/projects/vendor files script/test files\122484.pdf` is *unusual*, but is a valid path. Even if it wasn't, this isn't the main problem as it cannot possibly generate the particular error given in the question. – lvc Feb 27 '15 at 09:25