1

I'm using python 3.6. I am trying to read a lot of (.txt) files in multiple directories. Some files have a comma in the file name, e.g. 'Proposal for Anne, Barry and Carol.txt'.

The following code:

for filepath in glob.iglob(params.input_dir + r'\**\**.*', recursive=True):
    # [not shown here: code that filters on .txt filetype]

    with open(filepath) as f:
        for line in f:
            for word in re.findall(r'\w+', line):
                # do stuff

Gives me an error on reading that file:

Traceback (most recent call last):
  File "dir_scraper.py", line 50, in <module>
    results_new = scraper.scrape_file(filepath)
  File "C:\Projects\scraper.py", line 33, in scrape_file
    return func(filepath)
  File "C:\Projects\scraper.py", line 15, in txt
   with open(filepath) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'Z:\\groups\\Proposal for Anne, Barry and Carol.txt'

I do not want to edit the names of the files.

How can I properly read the files with comma's in the filenames?

Edit:

  • I'm sure the path exists.

  • Other files from the same directory are parsed without issues.

  • Trying to open the file directly from the commandline also gives: The system cannot find the path specified.

  • Also, I seem to be unable to rename the file, if I try to change the name through Windows File Explorer to remove the comma (or change something else), it is reset to the original filename.

  • Could it have something to do with file permissions?

  • Or maybe is the filename too long? The full path from Z:[..] to [..].txt is 270 characters long.
Håken Lid
  • 22,318
  • 9
  • 52
  • 67
Phantom
  • 91
  • 1
  • 11
  • 3
    I cannot reproduce this behavior with Python 3.6.3. Can you show where the variable filepath is set? – elzell Nov 20 '18 at 10:14
  • 1
    Maybe if you use `listdir` on the directory you can see what the file is actually called. – khelwood Nov 20 '18 at 10:15
  • Check the file name correctly, we don't usually need to escape/handle comma names in the file name or any parameter string. – Shariq Nov 20 '18 at 10:18
  • Are you sure your path `Z:\\groups` exists ? – Dinko Pehar Nov 20 '18 at 10:21
  • I'm sure the path exists. Other files from the same directory are parsed without issues. Directly from the commandline, trying to open the file also gives: `The system cannot find the path specified.` Also, I seem to be unable to rename the file, if I try to change the name through Windows File Explorer to remove the comma (or change something else), it is reset to the original filename. – Phantom Nov 20 '18 at 10:27
  • I am able to access the folder with a comma in it! Why can't you do so? – Sandesh34 Nov 20 '18 at 10:29
  • It might be that the comma is a red herring! (I thought that was the culprit since it was the only file in the folder with a comma, and the only one giving an error) Could it have something to do with file permissions? – Phantom Nov 20 '18 at 10:32
  • No, because I already tried creating the folder in my system directory. So, make sure that the file path is correct – Sandesh34 Nov 20 '18 at 10:34
  • I added some information in my post. It might have to do with file permissions or the length of the path? (in my pasted error example, I truncated the file path because of privacy) – Phantom Nov 20 '18 at 10:41
  • Can you provide some more chunk of code? – Sandesh34 Nov 20 '18 at 10:44
  • As for code: There really isn't that much else to it. I added one line though that shows where the filepath comes from. (it actually calls a function instead of directly doing the `with [..]` bit, but this is effectively what happens. – Phantom Nov 20 '18 at 10:48
  • Is it possible to read the file's content through the windows explorer ? – Alekos Nov 20 '18 at 11:02
  • Yes if I open the file I can read the contents. – Phantom Nov 20 '18 at 11:13

2 Answers2

1

This works fine on Python 3, Windows 10

import glob, re
for filepath in glob.iglob('C:/Users/test-ABC/Desktop/test/' + r'\**\**.*', recursive=True):
    with open(filepath) as f:
        print(f)
        for line in f:
            print(line)
            for word in re.findall(r'\w+', line):
                pass

<_io.TextIOWrapper
name='C:/Users/test-ABC/Desktop/test\\loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong
name\\another
looooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong
name\\test, file, name.txt' mode='r' encoding='cp1251'>

line1 
line2
line3

May be the problem in the long path. Try to check questions like this: Long paths in Python on Windows

VictorDDT
  • 583
  • 9
  • 26
  • The manifest of Python 3.6+ [supports long paths](https://docs.python.org/3/using/windows.html#removing-the-max-path-limitation), so if you have "LongPathsEnabled" set in "HKLM\System\CurrentControlSet\Control\FileSystem" in Windows 10, then normalized DOS paths support the native limit of up to about 32760 characters. Otherwise normalized DOS paths use the legacy limit of `MAX_PATH` (260) characters, and longer paths require an extended local-device path, which is prefixed with "\\?\" (or "\\?\UNC\" for UNC) and must be fully qualified (i.e. not relative) and Unicode. – Eryk Sun Nov 20 '18 at 22:05
  • Thank you @eryksun. Will note that. – VictorDDT Nov 21 '18 at 21:59
  • Thank you! It turned out that the path was too long, indeed. The comma threw me off. I'll have to look in to how best to support the long path. Thanks @eryksun for the suggestion, I'll see if that works. – Phantom Dec 06 '18 at 11:49
0

First, you only work on files, not directories, and second, you can use os.path.join to convert on Windows:

>>>os.path.join("d:\ss")
'd:\\ss'

Try this:

    from pathlib import Path
    import os
    import re
    pathName='./'# r'd:/xx' on windows
    fnLst=list(filter(lambda x:not x.is_dir(),Path(pathName).glob('**/*.txt')))
    print(fnLst)
    for fn in fnLst:
        with open(fn) as f:
            print()
            print(fn)
            for line in f:
                for word in re.findall(r'\w+', line):
                    print(word,end="|")

Output:

[PosixPath('2.txt'), PosixPath('1.txt')]


2.txt
This|tutorial|introduces|the|reader|informally|to|the|basic|concepts|and|features|of|the|Python|language|and|system|It|helps|to|have|a|Python|interpreter|handy|for|hands|on|experience|but|all|examples|are|self|contained|so|the|tutorial|can|be|read|off|line|as|well|
1.txt
Python|is|an|easy|to|learn|powerful|programming|language|It|has|efficient|high|level|data|structures|and|a|simple|but|effective|approach|to|object|oriented|programming|Python|s|elegant|syntax|and|dynamic|typing|together|with|its|interpreted|nature|make|it|an|ideal|language|for|scripting|and|rapid|application|development|in|many|areas|on|most|platforms|
myhaspldeep
  • 226
  • 2
  • 7