1

I'm trying to do something. I want to open multiple files and count the words in it for example, but I want to know how many of files couldn't be open.

Its what I tried:

i = 0
def word_count(file_name):
    try:
        with open(file_name) as f:
            content = f.read()
    except FileNotFoundError:
        pass
        i = 0
        i += 1
    else:
        words = content.split()
        word_count = len(words)
        print(f'file {file_name} has {word_count} words.')


file_name = ['data1.txt','a.txt','data2w.txt','b.txt','data3w.txt','data4w.txt']
for names in file_name:
    word_count(names)
print(len(file_name) - i , 'files weren\'t found')
print (i)

So, I get this error:

runfile('D:/~/my')
file data1.txt has 13 words.
file data2w.txt has 24 words.
file data3w.txt has 21 words.
file data4w.txt has 108 words.
Traceback (most recent call last):

  File "D:\~\my\readtrydeffunc.py", line 27, in <module>
    print(len(file_name) - i , 'files weren\'t found')

NameError: name 'i' is not defined

I tried something else also, but I think I don't understand the meaning of scopes well. I think its because i is assigned out of except scope, but when I assign i = 0 in except scope, I can't print it at the end, because it will be destroyed after execution.

Matt L.
  • 3,431
  • 1
  • 15
  • 28
Nima Metana
  • 65
  • 2
  • 10
  • Have you tried removing the `i = 0` inside the except? – AMC May 11 '20 at 15:28
  • Do you know what the ``global`` keyword does? – MisterMiyagi May 11 '20 at 15:44
  • 1
    Does this answer your question? [UnboundLocalError on local variable when reassigned after first use](https://stackoverflow.com/questions/370357/unboundlocalerror-on-local-variable-when-reassigned-after-first-use) – MisterMiyagi May 11 '20 at 15:44

3 Answers3

2

Yes, you're on the right track. You need to define and increment i outside the function, or pass the value through the function, increment, and return the new value. Defining i outside the function is more common, and more Pythonic.

def count_words(file_name):
    with open(file_name) as f:
        content = f.read()
    words = content.split()
    word_count = len(words)
    #print(f'file {file_name} has {word_count} words.')
    return word_count


file_name = ['data1.txt','a.txt','data2w.txt','b.txt','data3w.txt','data4w.txt']

i = 0
for names in file_name:
    try:
        result = count_words(names)
    except FileNotFoundError:
        i += 1

print(i, 'files weren\'t found')
Matt L.
  • 3,431
  • 1
  • 15
  • 28
  • Beware of using the same name for a function and its return value, as you do in `word_count`. Also, this code does not really fulfill the original intention of determining the number of words for the file that do exists, since `result` in the final loop gets overwritten at each iteration, before ever being used (printed or used otherwise). – norok2 May 11 '20 at 16:17
  • Point taken. Edited to fix that. – Matt L. May 11 '20 at 16:32
  • I believe you can do any action inside the except statement, but it will only trigger when that exception is raised. Also, if that action raises a new exception, you'll crash the program. – Matt L. May 11 '20 at 17:47
0

I would recommend breaking this into 2 functions; One to handle the word counting and a second to control the flow of the script. The control one should handle any errors that arise as well as handle and the feedback from said errors.

def word_count(file_name):
    with open(file_name) as f:
        content = f.read()
        words = content.split()
        word_count = len(words)
        print(f'file {file_name} has {word_count} words.')

def file_parser(files):
    i = 0
    for file in files:
        try:
            word_count(file)
        except FileNotFoundError:
            i+=1
    if i > 0:
        print(f'{i} files were not found')

file_names = ['data1.txt','a.txt','data2w.txt','b.txt','data3w.txt','data4w.txt']
file_parser(file_names)
  • Getting this refactored into functions is definitely a good idea. A less good idea is to use the same name for the function and one of its variables in `word_count()`. Also, the code would be much cleaner if the functions were to return the result of their computation, rather than printing it. Printing is useful, but renders the function less useful (e.g. if you were to do further computation or were to use a different UI). Hence, it is better to do it outside of the main purpose of the function. – norok2 May 11 '20 at 16:35
0

While refactoring your code to not use global variables should be the preferred approach (see edit for a possible refactoring), the minimal modification to get your code running is to remove pass and i = 0 within the except clause, and ask i to be used globally inside your function:

def word_count(file_name):
    global i  # use a `i` variable defined globally
    try:
        with open(file_name) as f:
            content = f.read()
    except FileNotFoundError:
        i += 1  # increment `i` when the file is not found
    else:
        words = content.split()
        word_count = len(words)
        print(f'file {file_name} has {word_count} words.')


i = 0
file_name = ['data1.txt','a.txt','data2w.txt','b.txt','data3w.txt','data4w.txt']
for names in file_name:
    word_count(names)
print(i, 'files weren\'t found')

Note that i will contain the number of files not found.


EDIT

A reasonably refactored code could look something like:

def word_count(filepath):
    result = 0
    with open(filepath) as file_obj:
        for line in file_obj:
            result += len(line.split())
    return result


def process_files(filepaths):
    result = {}
    num_missing = 0
    for filepath in filepaths:
        try:
            num_words = word_count(filepath)
        except FileNotFoundError:
            num_missing += 1
        else:
            result[filepath] = num_words
    return result, num_missing


filenames = [
    'data1.txt', 'a.txt', 'data2w.txt', 'b.txt', 'data3w.txt', 'data4w.txt']
wordcounts, num_missing = process_files(filenames)
for filepath, num_words in wordcounts.items():
    print(f'File {filepath} has {num_words} words.')
print(f'{i} files weren\'t found')

Notes:

  • the word_count() function now only does one thing: word counting. This is done on a line by line basis to better handle potentially long files, which could fill the memory if loaded at once.
  • the process_files() function extract the essential information and stores them in a dict
  • all the printing of the results is done in one place, and could be easily wrapped up in a main() function.
  • num_missing (formerly i, circa) is now a local variable.

Finally note that while explicitly counting the number of exception is one way, the other being just getting this information by subtracting the number of elements in result from the number of input filepaths. This could be done anywhere, there is no need to do this in process_files().

norok2
  • 25,683
  • 4
  • 73
  • 99
  • Why is a global variable preferred here? They are avoided whenever possible, in my experience. – Matt L. May 11 '20 at 15:43
  • See, for example, https://stackoverflow.com/questions/176118/when-is-it-ok-to-use-a-global-variable-in-c – Matt L. May 11 '20 at 15:44