0

I have been working on an assignment gathering data, and counting how many times each thing appears from a big dataset about 500mb. I have a couple of dictionaries reading csv files and putting data together and my final dict looks like this after all of the data has been gathered and worked on.

I am almost done with the assigment but am stuck on this section, I need to find the top 5 max values between all keys and values.

I have the following dictionary:

printed using: print key, task1[key]

KEY KEYVALUE

WA [[('1082225', 29), ('845195', 21), ('265021', 17)]]
DE [[('922397', 44), ('627084', 40), ('627297', 14)]]
DC [[('774648', 17), ('911624', 17), ('771241', 16)]]
WI [[('12618', 25), ('242582', 23), ('508727', 22)]]
WV [[('476050', 4), ('1016620', 3), ('769611', 3)]]
HI [[('466263', 5), ('226000', 5), ('13694', 4)]]

I pretty much need to go through and find the top 5 values and their ID number. for example

  1. DE 922397 44
  2. DE 627084 40
  3. WA 1082225 29

What would be the best way to do this?

**EDIT how i am putting together my task dictionary

task1 = {}
for key,val in courses.items():
    task1[key] = [sorted(courses[key].iteritems(), key=operator.itemgetter(1), reverse=True)[:5]]
Andy P
  • 111
  • 1
  • 10
  • 4
    What does the actual dictionary look like? The thing you posted is not valid Python syntax. – Cory Kramer Oct 14 '14 at 18:03
  • Relevant http://stackoverflow.com/questions/268272/getting-key-with-maximum-value-in-dictionary – Celeo Oct 14 '14 at 18:03
  • @cyber that is what my python outputs when i print using: for key,val in courses.items(): print key, task1[key] – Andy P Oct 14 '14 at 18:04
  • `task1[key] = [sorted(courses[key].iteritems(), key=operator.itemgetter(1), reverse=True)[:5]]` has an unnecessary pair of `[]` around it. This works just fine: `task1[key] = sorted(courses[key].iteritems(), key=operator.itemgetter(1), reverse=True)[:5]` – EML Oct 14 '14 at 18:21

1 Answers1

2

Assuming your dict looks something like:

mydict = {'WA': [('1082225', 29), ('845195', 21), ('265021', 17)], 'DE': [('922397', 44), ('627084', 40), ('627297', 14)], ...}

This is not the ideal representation. If you run this, you can flatten the list into a better format:

data = [(k, idnum, v) for k, kvlist in mydict.items() for idnum, v in kvlist]

Now the data will look like:

[('WA', '1082225', 29), ('WA', '845195', 21), ('WA', '265021', 17), ('DE', '922397', 44), ...]

In this format, the data is clearly readable, and it is obvious what we need to search. This line will sort the new tuples in descending order according to their [2] value:

sorted(data, key=lambda x: x[2], reverse=True)

Note: the dictionary you provided has an unnecessary [], so I removed that from the answer for clarity.

Edited after clarification.

EML
  • 435
  • 4
  • 12