0

I am trying to iteratively open some files to do some processing with the data. However, I haven't been able to make it work. I don't know what could be causing this.

sd = os.path.dirname(os.path.abspath(__file__))

file_names = []
for root,d_names,f_names in os.walk(os.path.join(sd, path)):
    for f in f_names:
        if f.endswith('.csv'):
            file_names.append(os.path.join(root, f))

for f_name in file_names:
    with open(f_name, 'r') as file:
        ...

I have also tried the following aproach, using pathlib

input_path = pathlib.Path(path)
file_names = input_path.glob('**/*.csv')

for f_name in file_names:
    with open(f_name.resolve(), 'r') as file:
        ...

Both methods yield the same result.

'path' is the name of a directory that sits on the same directory as the script. Reading the error seems to indicate the path is correct. The files sit in a somewhat complex file structure with pretty long filenames at times.

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\...

To give a bit more insight, here is a brief simplified representation of the file structure of path

path
¦-dir1
¦¦-dir2
¦¦¦-dir3
¦¦¦¦-sub1
¦¦¦¦¦-file-1a
¦¦¦¦-sub2
¦¦¦¦¦-file-1b
¦¦¦¦¦-file-2b

What I've found by testing is that when I replace path by dir3 to remove uneccessary traversal, the script will process file-1a which is the only one in that directory and file-1b, but give the same error when reaching file-2b. Furthermore, when making sub2 the target instead, it will process all files inside sub2 with no issues.

Also, as suggested, I tried adding the line print(os.access(f_name, os.R_OK), repr(f_name)) just before attempting to open the file. It turns out it returns False every time just before the error is raised(followed by the file path), and returns True whenever I've managed to process a file.

ABenju
  • 21
  • 4
  • why don't you use [pathlib](https://docs.python.org/3/library/pathlib.html). with `pathlib` you can use `Path(path).glob('*.csv')` – deadshot Aug 17 '20 at 10:14
  • The script dir isn't necessarily the same as the cwd. Get the script dir like this: `sd = os.path.dirname(os.path.abspath(__file__))`. Then do `os.walk(os.path.join(sd, path))`. There is no need to join the paths again when you open the files, since they should already be absolute. – ekhumoro Aug 17 '20 at 10:27
  • @deadshot Thanks, I like that implementation.Unfortunately, it doesn't seem to fix te error. – ABenju Aug 17 '20 at 10:42
  • what are error are you getting when you are using `pathlib` – deadshot Aug 17 '20 at 10:44
  • @ekhumoro I see your point. I just tried it out however and the result unfortunately seems to be the same. I've been checking the paths manually and they look correct. Thanks for the tip though. – ABenju Aug 17 '20 at 10:46
  • @deadshot Exactly the same error. I also tried `.resolve()` but nothing different apart from it being an absolute path. – ABenju Aug 17 '20 at 10:51
  • `Path.glob()` don't throw an error if the path not exist it will return empty generator – deadshot Aug 17 '20 at 10:54
  • @deadshot Right, I tried iterating through as a test and it seems to be working fine, returning paths to the files I expected. The problem is when I try to open them for some reason. – ABenju Aug 17 '20 at 11:00
  • can you share the code you have tried – deadshot Aug 17 '20 at 11:01
  • @deadshot I've updated the question with the latest version of both methods I've tried, one of them being the pathlib aproach. – ABenju Aug 17 '20 at 11:14
  • try this `for file in file_names: with file.open() as fp:` – deadshot Aug 17 '20 at 11:18
  • @deadshot Unfortunately, same error. – ABenju Aug 17 '20 at 11:24
  • @ABenju I just did a sanity check on the first example and it works fine for me on linux. That is, I created a directory called "tmp" with some csv files in it, put it in the same directory as the script, and then set `path = 'tmp'`. All the files were read without any errors. – ekhumoro Aug 17 '20 at 15:54
  • @ABenju As an extra test, please add the line `print(os.access(f_name, os.R_OK), repr(f_name))` to the beginning of the second for-loop and show the output you get. – ekhumoro Aug 17 '20 at 15:58
  • @ekhumoro I've updated the question to show the results from your suggestion as well as a strange finding I made by testing. I'll try on a different system when I get a chance, I'm starting to think it might have something to do with it. I've also tried moving the entire project to a different directory just in case, with no result. – ABenju Aug 18 '20 at 10:41
  • 1
    @ABenju For backward-compatibility reasons, Windows uses a maximum path length of 260 characters. That's partly why I asked you to show the `repr` of the file names. Please always show the full results in your question when asked for debugging output. See [this question](https://stackoverflow.com/q/1880321/984421) for more information on this issue and some possible work-arounds. – ekhumoro Aug 18 '20 at 11:12
  • @ekhumoro apologies, I'll keep it in mind for other questions. You seem to be spot on! I wasn't aware of the fact. Thanks. – ABenju Aug 18 '20 at 12:44

1 Answers1

0

Many thanks to @ekhumoro for pointing me in the right direction.

It seems my paths were longer than 260 characters, which is by default not allowed by Windows for backwards-compatibility reasons.

I changed the Windows registry to allow long paths and now my script has no issues accessing all the files in the structure.

ABenju
  • 21
  • 4