File not found error, even though file was found

Question

Using the following bit of code:

for root, dirs, files in os.walk(corpus_name):
    for file in files:
        if file.endswith(".v4_gold_conll"):
            f= open(file)
            lines = f.readlines()
            tokens = [line.split()[3] for line in lines if line.strip() 
and not line.startswith("#")]
    print(tokens)

I get the following error:

Traceback (most recent call last): File "text_statistics.py", line 28, in corpus_reading_pos(corpus_name, option) File "text_statistics.py", line 13, in corpus_reading_pos f= open(file) FileNotFoundError: [Errno 2] No such file or directory: 'abc_0001.v4_gold_conll'

As you can see, the file was, in fact, located, but then when I try to open the file, it... can't find it?

Edit: using this updated code, it stops after reading 7 files, but there are 172 files.

def corpus_reading_token_count(corpus_name, option="token"):
    for root, dirs, files in os.walk(corpus_name):
        tokens = []
        file_count = 0
        for file in files:
            if file.endswith(".v4_gold_conll"):
                with open((os.path.join(root, file))) as f:
                    tokens += [line.split()[3] for line in f if line.strip() and not line.startswith("#")]
                    file_count += 1
    print(tokens)
    print("File count:", file_count)

You find that file in `corpus_name`, but you are opening it in the current working directory. — tobias_k, Dec 22 '17 at 12:43
So my corpus contains hundreds of files, I need to access only the files that end in ".v4_gold_conll" and extract the information. I'm not sure how I would go about that... — socrlax24, Dec 22 '17 at 12:46

tobias_k · Accepted Answer · 2017-12-22T13:34:48.270

2

file is just the file without the directory, which is root in your code. Try this:

f = open(os.path.join(root, file)))

Also, you should better use with to open the file, and not use file as a variable name, shadowing the builtin type. Also, judging from your comment, you should probably extend the list of tokens (use += instead of =):

tokens = []
for root, dirs, files in os.walk(corpus_name):
    for filename in files:
        if filename.endswith(".v4_gold_conll"):
            with open(os.path.join(root, filename))) as f:
                tokens += [line.split()[3] for line in f if line.strip() and not line.startswith("#")]
print(tokens)

edited Dec 22 '17 at 13:34

answered Dec 22 '17 at 12:45

tobias_k

81,265
12
120
179

Thank you, this worked, but after replacing this part, it only extracted the tokens from one file, and not all of the files?? – socrlax24 Dec 22 '17 at 12:47
Now I'm having a problem where it is stopping after having only opened 7 files. There should be 172 files in the folder and subfolders. – socrlax24 Dec 22 '17 at 13:07
Well, there is no reason it should "stop" after seven files. Are you sure those files have the correct extension? Remember that files in Python are case-sensitive. What happens if you add an `else` to the `if`? – tobias_k Dec 22 '17 at 13:26
the files I am wanting to open all have the identical ending, ".v4_gold_conll". Does it make a difference that they're located within different subfolders and subsubfolders? – socrlax24 Dec 22 '17 at 13:30
It looks like it's only reading the files that are in the very lowest subfolder, and none of the files in other subfolders. – socrlax24 Dec 22 '17 at 13:32
@socrlax24 I think you have to move the initialization of `tokens` one more level up; see my edit. – tobias_k Dec 22 '17 at 13:35

score 0 · Answer 2 · answered Dec 22 '17 at 12:46

You'll have to join the root with the filename.

for root, dirs, files in os.walk(corpus_name):
    for file in files:
        if file.endswith(".v4_gold_conll"):
            with open(os.path.join(root, file)) as f:
            tokens = [
                line.split()[3]
                for line in f
                if line.strip() and not line.startswith("#")
            ]
            print(tokens)

File not found error, even though file was found

2 Answers2