Python3,dictionary from csv file to count frequency of words

Question

I am trying to write a function that reads a CSV file of students volunteers with different degrees. The aim of the function is to create a dictionary where keys are the degrees and the values are the frequency of the degrees.

Data is organized as follows;

name    degree     email

ABC     PhD.       abd@gmail.com
CDE     Ph.D.      cde@gmail.com
FGH     MD,PHD     fgh@gmail.com

Aim to get a dictionary as follows:

#degree_count{'phd':3,'md':1}

def degree_frequency(csv_file):
    f = open('csv_file')
    csv_f = csv.reader(f)
    #Creating a list to store all the degrees from the csv file
    student_degree_list=[]
    #Creating an empty dictionary to count the frequency
    degree_count={}
    for row in csv_f:
        student_degree_list.append(row[1]) 
    #Replacing fullstops to account for variations in writing degrees ( eg JD vs J.D)
    [word.replace(".", "") for word in student_degree_list]
    [word.lower() for word in student_degree_list]
    for ele in student_degree_list:
        if ele in degree_count:
            degree_count[ele]=degree_count[ele]+1
        else:
            degree_count[ele]=0
    return degree_count

@Aran Frey: I am trying it on an interactive platform.It just says that test cases failed.Not pinpointing the problem.So I am not sure whats wrong with the code. — learning_python, Aug 25 '18 at 06:16
@Tanmay Jain: I have been specifically told not to use pandas. — learning_python, Aug 25 '18 at 06:17
@learning_python what about Counters [link](https://docs.python.org/3/library/collections.html#collections.Counter) — Tanmay jain, Aug 25 '18 at 06:28
@Tanmay Jain.Can use counters. Thanks for the solution with pandas.Pandas is easier to understand for me — learning_python, Aug 25 '18 at 06:32

score 0 · Answer 1 · answered Aug 24 '18 at 23:43

0

I believe your problem was that the below code has no effect unless you assign it to a variable.

[word.replace(".", "") for word in student_degree_list]
[word.lower() for word in student_degree_list]

Also, if a degree has 1 occurrence shouldn't it be set to 1 and not 0?

Working code:

#degree_count{'phd':3,'md':1}

def degree_frequency():
    f = open('csv_file')
    csv_f = csv.reader(f)
    # Creating a list to store all the degrees from the csv file
    student_degree_list = []
    # Creating an empty dictionary to count the frequency
    degree_count = {}
    for row in csv_f:
        student_degree_list.append(row[1])
    #Replacing fullstops to account for variations in writing degrees ( eg JD vs J.D)
    student_degree_list = [word.replace('.','').lower() for word in student_degree_list]
    for ele in student_degree_list:
        if ele in degree_count:
            degree_count[ele] += 1
        else:
            # Supposed to be 1?
            degree_count[ele]=0
    return degree_count

answered Aug 24 '18 at 23:43

Alex Palumbo

48
4

Okay then, why not just use one list comprehension?, `[word.replace(".", "").lower() for word in student_degree_list]` – U13-Forward Aug 25 '18 at 00:45
I'm confused. I reused his list comprehension but assigned the resulting new list to the student_degree_list variable so that it actually changed the list. – Alex Palumbo Aug 25 '18 at 01:08
What do you mean, i basically merged the two list comprehensions – U13-Forward Aug 25 '18 at 01:10
Okay, I see what you are saying. I did only use one list comprehension. Look at my answer... – Alex Palumbo Aug 25 '18 at 22:37

Tanmay jain · Answer 2 · 2018-08-25T09:55:22.897

import csv 
from collections import Counter

columns = defaultdict(list) # each value in each column is appended to a list

with open('csv_file.csv') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

credit for csv reader code

degree_list = columns['degree']
degree_list_clean = []

for cad_degrees in degree_list:
    cad_degrees_lst = cad_degrees.split()
    for degree in cad_degrees_lst:
        degree_clean = degree.strip().replace('.','').lower()
        degree_list_clean.append(degree_clean)

option 1

output_dict_counter_version = dict(Counter(degree_list_clean))
print(output_dict_counter_version)

option 2

degree_frequency_dict = {}

for deg in degree_list_clean:
    if deg in degree_frequency_dict:
        degree_frequency_dict[deg] += 1
    else:
        degree_frequency_dict[deg] = 1

print(degree_frequency_dict)

Using pandas

import pandas as pd
from collections import Counter

data = pd.read_csv("csv_file.csv")
degree_list = data['degree'].tolist()


degree_list_clean = []

for cad_degrees in degree_list:
    cad_degrees_lst = cad_degrees.split()
    for degree in cad_degrees_lst:
        degree_clean = degree.strip().replace('.','').lower()
        degree_list_clean.append(degree_clean)

print(dict(Counter(degree_list_clean)))



'''
------------------ Input
name,degree,email
ABC,PhD. ,abd@gmail.com
CDE,Ph.D. ,cde@gmail.com
FGH, MD PHD ,fgh@gmail.com

-------------------- Output
{'phd': 3, 'md': 1}
'''

Python3,dictionary from csv file to count frequency of words

2 Answers2

option 1

option 2

Using pandas