How do I count reoccuring values in excel with python?

Question

So I've made a survey where I have a bunch of questions. I now want to analyze the answers with python. The answers for this question are:

meetbare_factoren_klimaat
Beschermen van ecosystemen;CO2-uitstoot;Mate van toegang tot betaalbare en duurzame energie voor iedereen
Beschermen van ecosystemen;Beschermen van biodiversiteit;CO2-uitstoot;Methaan (CH4)-uitstoot;Lachgas of distikstofoxide (N2O)-uitstoot;Belasting van de grond door stikstof en fosfor;Fijnstof in de lucht
Beschermen van ecosystemen;Beschermen van bossen;Beschermen van biodiversiteit;CO2-uitstoot;Gaten in de ozonlaag;Fijnstof in de lucht
Beschermen van ecosystemen;Beschermen van bossen;Beschermen van biodiversiteit;Mate van toegang tot betaalbare en duurzame energie voor iedereen
Beschermen van bossen;Beschermen van biodiversiteit;Fijnstof in de lucht
Beschermen van bossen;CO2-uitstoot;Fijnstof in de lucht
Beschermen van ecosystemen;Beschermen van bossen;Beschermen van biodiversiteit;CO2-uitstoot;Methaan (CH4)-uitstoot;Mate van toegang tot betaalbare en duurzame energie voor iedereen
Beschermen van ecosystemen;Beschermen van biodiversiteit;CO2-uitstoot;Fijnstof in de lucht
Beschermen van ecosystemen;Beschermen van bossen;Beschermen van biodiversiteit;CO2-uitstoot;Methaan (CH4)-uitstoot;Lachgas of distikstofoxide (N2O)-uitstoot;Ozon (O3)-uitstoot;Fluorgassen-uitstoot;Gaten in de ozonlaag;Belasting van de grond door stikstof en fosfor;Fijnstof in de lucht;Mate van toegang tot betaalbare en duurzame energie voor iedereen;Effect op menselijke gezondenheid
Beschermen van ecosystemen;Beschermen van bossen;Beschermen van biodiversiteit;CO2-uitstoot;Methaan (CH4)-uitstoot;Lachgas of distikstofoxide (N2O)-uitstoot;Ozon (O3)-uitstoot;Fluorgassen-uitstoot;Gaten in de ozonlaag;Belasting van de grond door stikstof en fosfor;Fijnstof in de lucht
Beschermen van ecosystemen;Beschermen van bossen;Beschermen van biodiversiteit;CO2-uitstoot
Beschermen van bossen;Beschermen van biodiversiteit;Mate van toegang tot betaalbare en duurzame energie voor iedereen

With this question, you can choose multiple of the given options, and suggest your own option(s).

I can make different lists of the chosen answers in a for loop, by using split(;) (The different options are separated with a ";").

I need my output to look something like this:

Beschermen van ecosystemen: 6 times (60%)

Beschermen van bossen: 4 times (30%)

Beschermen van biodiversiteit: 3 times (20%)

CO2-uitstoot: 0 times (0%)

Methaan (CH4)-uitstoot: 10 times (80%)

Lachgas of distikstofoxide (N2O)-uitstoot: 5 times (50%)

So I need to count the amount of times specific values are present in my data.

I've tried many things by now and I just can't figure it out.

This was my first attempt (more elifs needed for all the values, the values here are different, but that doesn't matter):

finan_sit_count = 0
ver_twe_recht_pol_count = 0
onv_count = 0
soc_cont_count = 0

for row in range(2, 12):
    char = "T"
    factoren_list = ws[char + str(row)].value.split(";")
    if "Tevredenheid met financiële situatie" in factoren_list:
        finan_sit_count += 1
    elif "Vertrouwen in tweede kamer, rechters en politie" in factoren_list:
        ver_twe_recht_pol_count += 1
    elif "Mate van onveiligheidsgevoelens" in factoren_list:
        onv_count += 1
    elif "Tevredenheid met sociale contacten" in factoren_list:
        soc_cont_count += 1

print("\nTevredenheid met financiële situatie: " + str(finan_sit_count))
print("Tevredenheid met financiële situatie in %: " +
      cnvt_to_procent_string(finan_sit_count, 12, 0))

print("\nVertrouwen in tweede kamer, rechters en politie: " +
      str(ver_twe_recht_pol_count))
print("Vertrouwen in tweede kamer, rechters en politie in %: " +
      cnvt_to_procent_string(ver_twe_recht_pol_count, 12, 0))

print("\nMate van onveiligheidsgevoelens: " + str(onv_count))
print("Mate van onveiligheidsgevoelens in %: " +
      cnvt_to_procent_string(onv_count, 12, 0))

print("\nTevredenheid met sociale contacten: " + str(soc_cont_count))
print("Tevredenheid met sociale contacten in %: " +
      cnvt_to_procent_string(soc_cont_count, 12, 0))

I thought this would work, although it looks not really that efficient, but it didn't count everything.

After that I've tried many things. One that I ended up almost getting to work was using the Counting() function.


for row in range(2, 12):
    char = "T"
    factoren_list = ws[char + str(row)].value.split(";")
    print("\n Factoren list" + str(row) + ":")
    print(factoren_list)
    result = list(Counter(factoren_list).items())
    print("\n Result" + str(row) + ":")
    print(result)
    factoren.update(result)
print("\n Factoren: ")
print(factoren)

The problem with using update is that it doesn't add the amounts together, so the maximum value of something will always be 1.

I need help with this very badly. It seems very simple to fix, and I really need an solution for this. Can somebody please help me out?

Welcome to SO. If I understood correctly what you want, the problem is the elif. When you put an if-elif logic, either none or one option gets executed, but never both. It seems to me that you need multiple (independent) ifs. — fdireito, Dec 12 '21 at 19:25
Say ```a = "X"```. ```if a == "X": do something``` followed by ```elif a == "Z": do other stuff```. You won't do other stuff, because the condition in the if was True. — fdireito, Dec 12 '21 at 19:28
@fdireito Maybe if just that simple. I'll check if that works right now. — xxx_MoffelGod_xxx, Dec 12 '21 at 19:30
Look at from collections import defaultdict. In your code answers = defaultdict(int). Each answer will be a key and count will be the value. See: https://stackoverflow.com/questions/5900578/how-does-collections-defaultdict-work — Carl_M, Dec 12 '21 at 19:32
@fdireito that does look like it helped, but it still gives me wrong answers, and I don't know why. — xxx_MoffelGod_xxx, Dec 12 '21 at 19:38

ljdyer · Answer 1 · 2021-12-13T20:26:29.917

0

Your code will be very difficult to maintain (when adding new values, etc.) with all the strings hard-coded into the if statements and then repeated later in the print statements, so I'd begin by storing these in a list that you can iterate over.

phrases_to_search = [
    "Tevredenheid met financiële situatie",
    "Vertrouwen in tweede kamer, rechters en politie"
    "Mate van onveiligheidsgevoelens",
    "Tevredenheid met sociale contacten",
]

I'd then combine all the phrases you are searching over into a single list to make the search easier. Something like:

char = "T"
row_range = range(2, 12)
all_responses = [ response for row in row_range for response in ws[char + str(row)].value.split(";") ]

You can then get all the counts using a single dictionary expression:

response_counts = {p: all_responses.count(p) for p in phrases_to_search}

and print the results with something like:

for r, count in response_counts.keys():
    print(r, ":", count)

I've omitted the bit where you print the percentages but I'm sure you can work that out.

edited Dec 13 '21 at 20:26

answered Dec 12 '21 at 19:40

ljdyer

1,946
1
3
11

Your solution looks extremely promising and elegant. Should I keep my for loop that I had, or is that not needed, because vscode is now saying that 'char' and 'response' aren't defined – xxx_MoffelGod_xxx Dec 12 '21 at 20:06
Apologies, I think I missed a part. You should define `char` and `row_range`, but not `response` as that is defined within the list comprehension. I have edited the code above accordingly. Hope it works for you! – ljdyer Dec 13 '21 at 04:54
Thanks for helping. However, it still gives me an error: https://imgur.com/a/S0cjz89 – xxx_MoffelGod_xxx Dec 13 '21 at 10:52
Apologies, the dictionary comprehension part should be `response_counts = {p: all_responses.count(p) for p in phrases_to_search}`. Have edited above. – ljdyer Dec 13 '21 at 20:27

How do I count reoccuring values in excel with python?

1 Answers1