0

I'm new here and I'm as well new to programming. I'm trying to learn myself a bit of python and I've run into problem. I have a very specific csv file which looks like this (I was able to do it in simplier csv files using advices here Creating a dictionary from a csv file? , but i'm struggling now ):

 1 row: Names,0,1900,1901, ---- ,2015

 2 row: Aaron,0,0,0, ----, 44

 x row: Randomname,0,number_of_babies_named_by_Randomname_in_year_1900, number_of_babies_named_by_Randomname_in_year_1901

there is total number of 3550 rows

Is there any way to create a dictionary I could navigate in so I'd be able to write a function to tell me in which year was a specific name the most popular or which is the most common used name overall between 1900 and 2015?

Thanks in advance! (sorry for potential grammar errors)

Community
  • 1
  • 1

3 Answers3

0

I haven't tested the code because I don't have the csv file but I'd do something like that. Bear in mind it's a quick-and-dirty way to do it but I think it works and then you can improve it.

import csv
name_to_year_count = dict()
f = open('names.csv')
csv_f = csv.reader(f)
for row in csv_f:
    start_year = 1899
    name = row[0]
    name_to_year_count[name] = dict()
    for index, count in enumerate(row, start=1):
       year = start_year + index
       name_to_year_count[name][year] = count

Then to find the year when a name was most popular an easy way is to sort the dictionary for each name by key:

import operator

def find_top_year(name):
    global name_to_year_count
    name_dict = name_to_year_count[name]
    # sort in ascending order 
    sorted_year = sorted(name_dict.items(), key=operator.itemgetter(1))
    return sorted_year[-1][0]

Can you test it with your csv file?

Vasilis
  • 2,721
  • 7
  • 33
  • 54
0

Just to get you started here is an idea.
Create a dictionary in such a way that every row is an entry.
Use the name as the key to the dictionary while the rest of the row is your value. You could store the value as a list. So for example:

d = {}
d['Aaron'] = [0,0,0, ----, 44]

Now you could easily find in which year the name was most common:

year, freq = max(enumerate(d['specific-name']), key = lambda x : x[1])
year+1900

In a similar manner you could find the most common name between 1900-2015 by going over the dictionary.

Tal J. Levy
  • 598
  • 2
  • 11
0

I think this is most of what you are asking for:

# CSV string (could be read in from a file)
csvString = """Joseph, 0, 1900, 1901, ---- , 2015
            Ishmael, 0, 1902, 1904, ---- , 2015
            Mary, 0, 1904, 1905, ---- , 2015"""

# Create an empty list to store all the dictionaries
dictionaryList = []

# Split the CSV string into individual CSV lines
csvList = csvString.split("\n")

# Loop through all entries in the CSV file
for csvLine in csvList:
    # Split CSV string
    csvValues = csvLine.split(",")


    # Create dictionary
    dictionary = {}
    dictionary["name"] = csvValues[0].strip()
    dictionary["numberOfBabies"] = csvValues[1].strip()
    dictionary["year1"] = csvValues[2].strip()
    dictionary["year2"] = csvValues[3].strip()

    # Add dictionary to list
    dictionaryList.append(dictionary)


# Print contents of all dictionaries    
for dictionaryEntry in dictionaryList:    
    print(dictionaryEntry["name"])
    print(dictionaryEntry["numberOfBabies"])
    print(dictionaryEntry["year1"])
    print(dictionaryEntry["year2"])
Razor Robotics
  • 183
  • 1
  • 9