Calculating the average of numbers in txt file

Question

im trying to write a function in python, that reads a file, extracts the numbers after a colon in the file, and returns the average of the numbers

(simple formula for average => sum of numbers / amount of numbers)

the file looks like:

thenumber1:7

#more could be added

thenumber2: 4 (#yes there is a space after the colon)

lines that start with "#" should be ignored.

my code so far:

import os

def get_average_n(path):


s = ""
t = 0
v = 0

if not os.path.exists(path):
    return None

if os.stat(path).st_size == 0:
    return 0.0

else:
    p = open(path, "r")

    content = p.readlines()

    for line in content:
        if line.startswith("#"):
            continue

        elif not line.startswith("#"):
            x = line.find(":")
            s += line[x + 1:]
            h = s.replace("\n", "")

            for c in h:
                if c.isdigit():
                    t += float(c)
                    v += 1
                    avg = round(t / v, 2)

return avg
print(get_average_n("file.txt"))

The output should be 5.5 in the case mentionet (fat) but im getting wrong outputs , and i really cant find the issue. It gives 6.25 back instead of 5.5.

Unrelated: Why check `if line.startswith('#')` and then `elif not line.startswith('#')`? If it goes beyond the `if`, you _know_ that `line` doesn't start with `#` — Pranav Hosangadi, Oct 19 '20 at 17:32
well thanks i didnt realize that. now it is working fine except for if the number is a float like, thenumber1: 3.25 #comment thenumbe2r:5 -> Gives me back 3.75 but should be 4.125 — Aru, Oct 19 '20 at 18:32

Mike67 · Answer 1 · 2020-10-19T17:41:30.010

0

To clear up my comment.

I fixed the indent and I get the correct result.

ss = '''
thenumber1:7

#more could be added

thenumber2: 4
'''.strip()

with open('file.txt','w') as f: f.write(ss)

#################

import os

def get_average_n(path):
    s = ""
    t = 0
    v = 0

    if not os.path.exists(path):
        return None

    if os.stat(path).st_size == 0:
        return 0.0

    else:
        p = open(path, "r")
        content = p.readlines()

    for line in content:
        if line.startswith("#"):
            continue
        elif not line.startswith("#"):
            x = line.find(":")
            s += line[x + 1:]
            h = s.replace("\n", "")

    for c in h:
        if c.isdigit():
            t += float(c)
            v += 1
            avg = round(t / v, 2)

    return avg
    
print(get_average_n("file.txt"))  # 5.5

------- For a cleaner approach to parsing the file, try this:

def get_average_n(path):
    numlst = []

    if not os.path.exists(path):
        return None

    if os.stat(path).st_size == 0:
        return 0.0

    else:
        with open(path, "r") as p:
           content = p.readlines()

    for line in content:
        if not line.startswith("#") and line.find(":") >= 0:
            numlst.append(float(line.split(':')[1].strip()))

    avg = sum(numlst)/len(numlst)

    return avg

edited Oct 19 '20 at 17:41

answered Oct 19 '20 at 17:28

Mike67

11,175
2
7
15

Thanks a lot, but it seems that it doesnt work for numbers like 3.25, could you help me with that aswell? . for example: if thenumber1:3.25 and thenumber2: 5, it should give back 4.00 but im getting 3.75 – Aru Oct 19 '20 at 18:25
I replaced the numbers with 3.25 and 5. The result is 4.125 which is correct. – Mike67 Oct 19 '20 at 18:42
you are right, thanks a lot, i retried again. but i dont understand why the floating numbers are not working for the first solution. only "the cleaner approach" works for the floats. – Aru Oct 19 '20 at 18:45
You are using `isdigit` to check each digit in the number. The decimal `.` is not a digit so it gets skipped. Your code only works with single digit integers. – Mike67 Oct 19 '20 at 18:52
is there a way to identify the decimal? except for changing the code the way you did. – Aru Oct 19 '20 at 18:53
In your code, you're merging the numbers into a single string. For numbers 1,2,3, you get "123" then you read each digit to sum 1+2+3. This won't work for double digits or floats: 1,12,3.4 becomes "11234" which leads to 1+1+2+3+4 which is the wrong total. – Mike67 Oct 19 '20 at 19:08

hygull · Answer 2 · 2020-10-20T02:08:16.647

I have added the solution with the use of regular expressions. I love regular because of their great find, search, replacement features. See it below.

Note: Copy lines.py & file.txt in and directory & run. You will get 4.75 which is (7 + 4 + 3.25) / 3.

Reference links: Python Regex instantly replace groups, How to extract a floating number from a string

» file.txt

thenumber1:7

#more could be added

thenumber2: 4 

thenumber1: 3.25

» lines.py

import os, re

def get_average_n(path):
    if not os.path.exists(path):
        return 0.0

    if os.stat(path).st_size == 0:
        return 0.0
    else:
        total, avg, count = 0.0, 0.0, 0
        regex = r"(.*):\s*(\d+(\.\d+)?)(.*)\s*" # Regex to match interger & 3.25, 4.56 kind of floats
        p = open(path, "r")
        content = p.readlines()

        for line in content:
            if line.strip().startswith('#'):
                continue

            num_s = re.sub(regex, r'\2', line).strip()
            if num_s:
                total += float(num_s)
                count += 1

        if count:
            avg = total / count

    return avg


if __name__ == "__main__":
    print(get_average_n('file.txt'))

» Finally run using python lines.py.

Thanks.

Thanks for your help but now i need help with a corner case. thenumber1: or thenumber:2 can be floating numbers like: thenumber1: 3.25 #morespace thenumber2: 5 — Aru, Oct 19 '20 at 18:36
Then you need to change the regex to match the floats & integers. I have updated my answer. Changed `r"(.*):\s*(\d+)(.*)\s*"` to `r"(.*):\s*(\d+(\.\d+)?)(.*)\s*"` & updated `file.txt` as per your given examples. So If you need more matches then just update your regex like that and you are done. — hygull, Oct 20 '20 at 02:05

Calculating the average of numbers in txt file

2 Answers2