0

I have an CSV file that contains name of some series, her worth and her genre.

Example:

Descendant Without A Conscience,505.4,happy
Wolf Of The Solstice,30000,sad
Women Of Hope,-4000,neutral

I need to print a dictionary that gives the average worth of the series of the same genre:

{'happy': 192421.475, 'sad': 1659412.5, 'neutral': 30733.5'}

The only genres that are valid are happy, sad and neutral.

This is what I have tried:

d = {}
file_to_check = open('in_file.txt', 'r')
sum_for_happy = 0
sum_for_sad = 0
sum_for_neutral = 0
count_of_happy = 0
count_of_sad = 0
count_of_neutral = 0
for line in file_to_check:
    lst = []
    lst = line.rstrip().split(',')
    if lst[2] == 'happy':
        sum_for_happy += float(lst[1])
        count_of_happy += 1
        continue
    if lst[2] == 'sad':
        sum_for_sad += float(lst[1])
        count_of_sad += 1
        continue
    if lst[2] == 'neutral':
        sum_for_neutral += float(lst[1])
        count_of_neutral += 1
        continue
if sum_for_happy == 0 :
    value_for_happy = 'NA'
else:
    value_for_happy = sum_for_happy / count_of_happy
if sum_for_sad == 0 :
    value_for_sad = 'NA'
else:
    value_for_sad = sum_for_sad / count_of_sad
if sum_for_neutral == 0 :
    value_for_neutral = 'NA'
else:
    value_for_neutral = sum_for_neutral / count_of_neutral
d = {'happy':value_for_happy, 'sad':value_for_sad, 'neutral':value_for_neutral}
return d 

But don't matter what values are in the CSV file the output is always the same:

{'happy': 'NA', 'sad': 'NA', 'neutral': 'NA'}

Like it does not enter the for loop at all and I can't understand why.

martineau
  • 119,623
  • 25
  • 170
  • 301
  • try to use `pandas` – Gabio May 10 '20 at 16:54
  • Cannot reproduce. After removing the errant `retrun d` I got `{'happy': 505.4, 'sad': 30000.0, 'neutral': -4000.0}` – tdelaney May 10 '20 at 16:58
  • @MarkMeyer The format of the CSV file is that at first place it always has the series name, at the second it's worth and the third the genre. So at the if condition I want to see what genre the series is so I put `lst[2]` – violettagold May 10 '20 at 16:59
  • 1
    @MarkMeyer - um, "happy" is the third item in the list. – tdelaney May 10 '20 at 16:59
  • 1
    Not enough coffee. Sorry. Thanks @tdelaney. – Mark May 10 '20 at 17:00
  • 3
    @Gabip - Why? pandas is a tool not a cult. – tdelaney May 10 '20 at 17:00
  • @tdelaney Can you please explain to me what reproduce mean? I am new to python :) and I have put the `return d` because it supposed to be a function. – violettagold May 10 '20 at 17:03
  • The code you posted is not in a function - that's not a problem because this is an example, not real code. I created a file with your data, saved your example as a .py file and ran it. it worked - it did not produce the NA you see. That's what I mean by "could not reproduce" - your code plus your data gave me a different result. Since we see code working correctly, we can't really guess what the error is. Perhaps your real data has different columns? Or maybe there are no lines in your `in_file.txt` at all? You could check its size with `os.stat("in_file.txt").st_size` – tdelaney May 10 '20 at 17:22
  • @tdelaney my file is the exact format as I wrote. Can the problem be that I open `file.csv` and not `file.txt`? or can a try-except make this mistake? And thank you for the explanation, it helped :) – violettagold May 10 '20 at 17:52

1 Answers1

0

If your file has data, your for loop will run. You could shorten your code a bit:

# create the file like you posted it into the description of your question
with open("f.txt","w") as f: 
    f.write("""Descendant Without A Conscience,505.4,happy
Wolf Of The Solstice,30000,sad
Women Of Hope,-4000,neutral""")

and process it:

genre = ["happy", "sad", "neutral"]

# generate dictionary with the allowed keys and a list as default value
d = { g:[] for g in genre}

with open('f.txt') as f:
    for line in f:
        name, value, cat  = line.rstrip().split(',')
        if cat in d:
            # add the float value to your dictionaries list
            d[cat].append(float(value))

# sum the values in the lists and divide through list length - use "N/A" if list empty
sums = { cat:sum(data)/len(data) if data else 'N/A' for cat,data in d.items()}
print(sums)

Outputs:

{'happy': 505.4, 'sad': 30000.0, 'neutral': -4000.0}

If you use

genre = ["happy", "sad", "neutral", "for demonstrational purposes"]

you'll get

{'happy': 505.4, 'sad': 30000.0, 'neutral': -4000.0, 
 'for demonstrational purposes': 'N/A'}

printed.

You can make this faster/better using defaultdict(list) from collections module if the speed is a problem with your original data.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69