Calculating Statistics From File

Question

Write a function named file_stats that takes one string parameter (in_file) that is the name of an existing text file. The function file_stats should calculate three statistics about in_file: the number of lines it contains, the number of words and the number of characters, and print the three statistics on separate lines. For example, the following would be be correct input and output. (Hint: the number of characters may vary depending on what platform you are working.)
file_stats('created_equal.txt')

lines 2

words 13

characters 72

Below is what I have:

fileName = "C:\Users\Jeff Hardy\Desktop\index.txt"
chars = 0
words = 0
lines = 0

def file_stats(in_file):
    global lines, words, chars
    with open(in_file, 'r') as fd:
        for line in fd:
            lines += 1
            wordsList = line.split()

        words += len(wordsList)

        for word in wordsList:
            chars += len(word)

file_stats(fileName)
print("Number of lines: {0}".format(lines))
print("Number of words: {0}".format(words))
print("Number of chars: {0}".format(chars))

The code is giving me the following error:

(unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated\UXXXXXXXX escape

fileName it's just an example. The file can have any name you want it to have, also the content in the file can be anything you want it to be. It doesn't specify. It just has to calculated the statistics like it says in the question. — , Mar 26 '18 at 01:50
`len(wordsList)` is being calculated and summed after you read all the lines. — OneCricketeer, Mar 26 '18 at 01:53
As a side note: [never use unescaped backslashes in non-raw strings](https://stackoverflow.com/questions/2953834/windows-path-in-python). You're lucky that `\i` happens not to be an escape sequence; don't count on that luck for your other paths. — abarnert, Mar 26 '18 at 02:03
Meanwhile, please give us the entire exception traceback, not just the first line. Without that, we can't tell where in your code you're getting that error, except by guessing. Read [mcve] in the help for more. — abarnert, Mar 26 '18 at 02:04
Finally, if the error is coming from reading the file, we can't debug that without knowing what's actually in the file. For a random example, if it's BOM-encoded UTF-16, the answer is to put `, encoding='utf-16'` on the end of your `open,` but if it's something entirely different, that answer won't help you. Since the error seems to be from the first few bytes, just doing something like `with open(in_file, 'rb') as f: print(f.read(80))` (notice the `'rb'` mode) and showing us the output is probably sufficient. — abarnert, Mar 26 '18 at 02:07

OneCricketeer · Answer 1 · 2018-03-26T18:28:55.533

I believe your error has to do with the encoding of your file,

or needs to be fileName = "C:\\Users\\Jeff Hardy\\Desktop\\index.txt"

and the instructions are asking you to print within the function, not affect a global variable, then you need to update the values within the loop, not after it (indentation matters)

def file_stats(in_file):
    lines = words = chars = 0
    with open(in_file, 'r', encoding="utf-8") as fd:
        for line in fd:
            lines += 1
            words += len(line.split())  # If you split "x , y" is the comma a word?
            chars += len(line)  # Are spaces considered a character?

    print("lines {0}".format(lines))
    print("words {0}".format(words))
    print("characters {0}".format(chards))

Calculating Statistics From File

1 Answers1