0

I have a data set that contains first and last name, and a person's age. My initial thought was to store the data in a dictionary with the key being name but within this dataset, there could be duplicates that would always overwrite a person with the same name. I was wondering if there was any other data structure I may be missing that could store this data? (I am reading this data from a CSV)

Example of the data below in a dictionary:

{'nelson bighetti': 37, 'rick sanchez': 34, 'linda mort': 25 }

The end goal for this data would be to run some simple calculations such as finding the average age of all the people, and the median age of the dataset.

dmc94
  • 536
  • 1
  • 5
  • 16
  • 2
    A dictionary with keys being names, and values being a list containing the ages of all people with that name? – jasonharper Apr 03 '20 at 03:46
  • That is an interesting idea, I like it, but It leads me to a new question of how I would run operations on this data? For example, I am searching for the average age and the median age of this data set. – dmc94 Apr 03 '20 at 03:48
  • 1
    you can use stats method in dictionary. Also i would like to suggest to see this example. https://stackoverflow.com/questions/10664856/make-a-dictionary-with-duplicate-keys-in-python – parlad Apr 03 '20 at 03:51
  • Thank you for that link! running stats methods on the list values throw some interesting errors. – dmc94 Apr 03 '20 at 04:00
  • 1
    It's not clear really what you intend to *do* with the data in the data structure, so it's hard to tell what features would be valuable to you. – Blckknght Apr 03 '20 at 04:00
  • Yes, you are correct I have edited the original post with more information about what I would like to do with the data. (I am trying to find things like the Average age and Median age within the data set) – dmc94 Apr 03 '20 at 04:03

1 Answers1

2

If you are searching for the average and median age of this data set, and same-named people are considered to be different individuals (with possibly different ages), then you don't actually need the name data! Just do your operation over the ages, and disregard the names.

As a more general answer, why not just use a list of lists (or, equivalently, a list of tuples). like this:

data = [('nelson bighetti', 37),
        ('rick sanchez', 34), 
        ('linda mort', 25),
        ('rick sanchez', 58), 
        ('rick sanchez', 7), 
        ...
       ]

That is a good as your original data (a csv file), and you can do all operations on age thusly:

for name, age in data:
    # do some operation on age here
Dan H
  • 14,044
  • 6
  • 39
  • 32