I need to load data from a csv file or an excel sheet (with rows and columns) into a two-dimensional python dict. For example, if the data in an excel sheet looks like this:
name age gender location
1 Jim 18 male China
2 Ross 18 male China
3 Cara 19 female Japan
4 Ted 18 male China
Then the output python dict should look like this:
data = {
1: {'name': 'Jim', 'age': 18, 'gender': 'male', 'location': 'China'},
2: {'name': 'Ross', 'age': 18, 'gender': 'male', 'location': 'China'},
3: {'name': 'Cara', 'age': 19, 'gender': 'female', 'location': 'Japan'},
4: {'name': 'Ted', 'age': 18, 'gender': 'male', 'location': 'China'}
}
You can see that there are a lot of duplicate infos in this 2-d dict (and for real data, it has the same condition), so I came up with an idea of developing a new dict with shared memory. To be specific, in the example above, I want my 2-d dict only save one copy of {'age': 18, 'gender': 'male', 'location': 'China'}
across multiple rows (these rows do not need to be adjacent). If we call data[1]['age']
and data[2]['age']
, it should do the lookup in the same extracted small shared dict.
I have read the source code of python dict, and I know python dict only save pointers to keys and values (and usually for small int and string object, different pointers may point to the same object). So when I mean I want to save only one copy, I mean one copy of pointers.
Any idea about how to design this dict? Thanks very much!!!
EDIT
Sorry, I forget to mention. The data in this 2-d dict will be read-only.