2

The output of my code gives the following:

[{'Total Population:': 4585, 'Total Water Ice Cover': 2.848142234497044, 'Total Developed': 17.205368316575324, 'Total Barren Land': 0.22439908514219134, 'Total Forest': 34.40642126612868},

 {'Total Population:': 4751, 'Total Water Ice Cover': 1.047783534830167, 'Total Developed': 37.27115716753022, 'Total Barren Land': 0.11514104778353484, 'Total Forest': 19.11341393206678},

 {'Total Population:': 3214, 'Total Water Ice Cover': 0.09166603009701321, 'Total Developed': 23.50469788404247, 'Total Barren Land': 0.2597204186082041, 'Total Forest': 20.418608204109695},

 {'Total Population:': 5005, 'Total Water Ice Cover': 0.0, 'Total Developed': 66.37545713124746, 'Total Barren Land': 0.0, 'Total Forest': 10.68671271840715},

...
]

What I'd like to be able to do is get the all the values for 'Total Population' and store that in one list. Then get all the 'Total Water Ice Cover' and store that in another list, and so on. With a data structure like this how does out extract out these values and store them into separate lists?

Thank you

  • 1
    What do you want to have happen if one of the dictionaries has different keys than the others? For example, what should happen if not every dictionary has a "Total Population" value? – Daniel Pryden Dec 11 '18 at 20:31
  • 1
    Where are you getting stuck? Do you know how to iterate through the list? Do you know how to access a value in the dictionary by key? Can you then append these values to separate lists? I'd also suggest that you look into [pandas DataFrames](https://stackoverflow.com/questions/20638006/convert-list-of-dictionaries-to-dataframe). – pault Dec 11 '18 at 20:32
  • @DanielPryden I want to be able to calculate pearsons r correlation – user10777757 Dec 11 '18 at 20:38

5 Answers5

2

If your goal is to calculate Pearson's correlation, you should use pandas for this.

Suppose your original list of dictionaries was stored in a variable called output. You can easily convert it into a pandas DataFrame using:

import pandas as pd
df = pd.DataFrame(output)
print(df)
#   Total Barren Land  Total Developed  Total Forest  Total Population:  Total Water Ice Cover
#0           0.224399        17.205368     34.406421               4585               2.848142 
#1           0.115141        37.271157     19.113414               4751               1.047784 
#2           0.259720        23.504698     20.418608               3214               0.091666   
#3           0.000000        66.375457     10.686713               5005               1.047784 

Now you can easily generate a correlation matrix:

# this is just to make the output print nicer
pd.set_option("precision",4)  # only show 4 digits

# remove 'Total ' from column names to make printing smaller
df.rename(columns=lambda x: x.replace("Total ", ""), inplace=True)  

corr = df.corr(method="pearson")
print(corr)
#                 Barren Land  Developed  Forest  Population:  Water Ice Cover
#Barren Land           1.0000    -0.9579  0.7361      -0.7772           0.4001
#Developed            -0.9579     1.0000 -0.8693       0.5736          -0.6194
#Forest                0.7361    -0.8693  1.0000      -0.1575           0.9114
#Population:          -0.7772     0.5736 -0.1575       1.0000           0.2612
#Water Ice Cover       0.4001    -0.6194  0.9114       0.2612           1.0000

Now you can access individual correlations by key:

print(corr.loc["Forest", "Water Ice Cover"])
#0.91135717479534217
pault
  • 41,343
  • 15
  • 107
  • 149
  • THANK YOU THIS IS GREAT! What would be the 'r' here? Between population and land cover type? Would each one of those numbers represent a different 'r'. I am having trouble understanding where the pearson r correlation is between pop and land type. – user10777757 Dec 11 '18 at 21:01
  • This is a correlation matrix- each row/col is the "r" value between that row and that column. Land type is not a variable in your example. – pault Dec 11 '18 at 21:06
  • Thank you so much. So using the key will give me 'r'. Fantastic answer. I might ask a separate question or add this to this one. I am trying to then perform multiple linear regression between the population density and area percentage of the following surface covers and calculate the R2 of the regression: developed, class planted/cultivated class and maybe some other. Could this also be done through pandas? – user10777757 Dec 11 '18 at 21:10
  • Yes it can certainly be done. Do a search for multiple linear regressions in python and you should see examples using `pandas` and `sklearn` – pault Dec 11 '18 at 21:12
  • Thank you. Is multivariate and multiple linear regression the same? unfortunately I'm not seeing anything that fits my dataset. I see a lot of links related to linear regression but not multiple linear regression. – user10777757 Dec 11 '18 at 21:21
  • Hey I am getting an error it says that: AttributeError: 'int' object has no attribute replace. – user10777757 Dec 11 '18 at 23:00
  • I commented out the line: df.rename(columns=lambda x: x.replace("Total ", ""), inplace=True) .... and I get an empty dataframe. No columns no indexes. – user10777757 Dec 11 '18 at 23:15
1

I guess you can use something like:

d = [{'Total Population:': 4585, 'Total Water Ice Cover': 2.848142234497044, 'Total Developed': 17.205368316575324, 'Total Barren Land': 0.22439908514219134, 'Total Forest': 34.40642126612868},
 {'Total Population:': 4751, 'Total Water Ice Cover': 1.047783534830167, 'Total Developed': 37.27115716753022, 'Total Barren Land': 0.11514104778353484, 'Total Forest': 19.11341393206678},
 {'Total Population:': 3214, 'Total Water Ice Cover': 0.09166603009701321, 'Total Developed': 23.50469788404247, 'Total Barren Land': 0.2597204186082041, 'Total Forest': 20.418608204109695},
 {'Total Population:': 5005, 'Total Water Ice Cover': 0.0, 'Total Developed': 66.37545713124746, 'Total Barren Land': 0.0, 'Total Forest': 10.68671271840715}]

f = {}
for l in d:
    for k, v in l.items():
        if not k in f:
            f[k] = []
        f[k].append(v)
print(f)

{'Total Population:': [4585, 4751, 3214, 5005], 'Total Water Ice Cover': [2.848142234497044, 1.047783534830167, 0.09166603009701321, 0.0], 'Total Developed': [17.205368316575324, 37.27115716753022, 23.50469788404247, 66.37545713124746], 'Total Barren Land': [0.22439908514219134, 0.11514104778353484, 0.2597204186082041, 0.0], 'Total Forest': [34.40642126612868, 19.11341393206678, 20.418608204109695, 10.68671271840715]}

Python Demo

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
1

You could use pandas:

pd.DataFrame(my_dict).to_dict(orient='list')

Returns:

{'Total Barren Land': [0.22439908514219134, 0.11514104778353484, 0.2597204186082041, 0.0],
'Total Developed': [17.205368316575324, 37.27115716753022, 23.50469788404247, 66.37545713124746],
'Total Forest': [34.40642126612868, 19.11341393206678, 20.418608204109695, 10.68671271840715],
'Total Population:': [4585, 4751, 3214, 5005],
'Total Water Ice Cover': [2.848142234497044, 1.047783534830167, 0.09166603009701321, 0.0]}
rahlf23
  • 8,869
  • 4
  • 24
  • 54
0

Call your list of dictionaries dictionary_list. Then:

keys = {k  for d in dictionary_list for k in d.keys()}
list_of_values = [[v for d in dictionary_list for k, v in d.items() if k == key] for key in keys]

Using your example this outputs:

[[17.205368316575324, 37.27115716753022, 23.50469788404247, 66.37545713124746],
 [0.22439908514219134, 0.11514104778353484, 0.2597204186082041, 0.0],
 [2.848142234497044, 1.047783534830167, 0.09166603009701321, 0.0],
 [4585, 4751, 3214, 5005],
 [34.40642126612868, 19.11341393206678, 20.418608204109695, 10.68671271840715]]

If you want a new dictionary with the relevant value lists then switch the second line with:

new_dict = {key: [v for d in dictionary_list for k, v in d.items() if k == key] for key in keys}
ShlomiF
  • 2,686
  • 1
  • 14
  • 19
0

If all the dicts have the same keys, then you can just use the keys of the first dict:

result = {k:[d[k] for d in dictionary_list] for k in dictionary_list[0].keys()} 

If the dicts could have different sets of keys, but you're OK with lists of different lengths, I would use a defaultdict to simplify:

from collections import defaultdict
result = defaultdict(list)
for d in dictionary_list:
    for k, v in d.items():
        result[k].append(v)

If the dicts could have different sets of keys, and you want all the lists to be the same length, then you'll need to iterate twice. You'll also need some kind of placeholder value to use for when the key is missing. If we want to use None for that, we can do:

placeholder = None
keys = set()
for d in dictionary_list:
    keys += set(d.keys())
result = {k:[] for k in keys}
for d in dictionary_list:
    for k in keys:
        result[k].append(d.get(k, placeholder))

In each case result is a dict of lists. If you want a list of lists it's actually even simpler:

result = [[d[k] for d in dictionary_list] for k in dictionary_list[0].keys()]

If you want all the lists to be the same length and include placeholders then you'll still need to use a dict of lists as an intermediate step. But it's easy to transform from a dict of lists to a list of lists of values:

list_of_lists_of_values = list(dict_of_lists_of_values.values())

That said, prior to Python 3.7, dictionaries didn't have a well-defined iteration order, so you're probably better off using a dictionary anyway, because otherwise it's hard to be certain you're getting the right values (e.g. "Total Population" isn't guaranteed to be the first series of values).

Daniel Pryden
  • 59,486
  • 16
  • 97
  • 135