Join DataFrames which are values in Dicts contained in different Lists of Dicts

Question

I have 2 lists of several one element dictionaries each. The dictionaries have a string as key and a DataFrame as value. The keys are not sorted alphabetically within the lists. My goal is to perform pandas join function on the DataFrames in each list corresponding to each dictionary key.

Schematically:

import pandas as pd
import numpy as np

a = pd.DataFrame(np.random.randn(4, 4), columns=['h', 'i', 'j', 'k'])
b = pd.DataFrame(np.random.randn(4, 4), columns=['h', 'i', 'j', 'k'])
c = pd.DataFrame(np.random.randn(4, 4), columns=['h', 'i', 'j', 'k'])
d = pd.DataFrame(np.random.randn(4, 4), columns=['h', 'i', 'j', 'k'])
e = pd.DataFrame(np.random.randn(4, 4), columns=['n', 'o', 'p', 'q'])
f = pd.DataFrame(np.random.randn(4, 4), columns=['n', 'o', 'p', 'q'])
g = pd.DataFrame(np.random.randn(4, 4), columns=['n', 'o', 'p', 'q'])
h = pd.DataFrame(np.random.randn(4, 4), columns=['n', 'o', 'p', 'q'])

d1 = {'m':a}
d2 = {'h':b}
d3 = {'z':c}
d4 = {'b':d}
d5 = {'z':e}
d6 = {'h':f}
d7 = {'m':g}
d8 = {'b':h}

l1 = [d1, d2, d3, d4]
l2 = [d5, d6, d7, d8]

My goal is to join the DataFrames which have a corresponding key on each list, through a loop, in order to have the code matching the keys, ideally:

d1.join(d7)
d2.join(d6)
d3.join(d5)
d4.join(d8)

I cannot find a way to figure out the code to do this. Thanks for any help in advance!

What do you mean by `join`, it's not so clear when both indices overlap. — ALollz, Jul 24 '19 at 20:57
Anyway, I think this is relevant: https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables There are too many dicts and lists here. Why have 8 dicts and 2 lists when this can be organized with 2 dicts, or better yet, even just one with a tuple as the key `('m', 1)` — ALollz, Jul 24 '19 at 21:31
It's likely attracting downvotes because though you've provided sample data, there's no real indication of what the expected output should be and how it should be stored. You've had 3 solutions that all went in totally different directions, and yet none of them seem to have solved your issue given the lack of a green check or even an upvote indicating it was helpful. In the end, this lack of clarity only wastes your time and the time spent by others trying to solve your problem. The downvotes likely act as a signal to others that it's not worth the time to answer. — ALollz, Jul 24 '19 at 21:48
If you're scraping this data into these data structures you should probably fix the bad decisions that have taken you to this point. I can't imagine how exactly you ended up with this problem. — CJR, Jul 24 '19 at 22:00
Your comment only reflects your blatant lack of experience in Data Wrangling. But sure, I can explain one of many reasons why you can end up having a list of dicts with Data Frames like I did. In order to save and read Data Frames to json files, Pandas only has the DataFrame.to_json and pd.read_json functions that work for single Data Frames. So in order to avoid having hundreds of separated json files when you are scraping lots of sites simultanously, I figured a shortcut to put all my DF's with a key value in a dict each and succeded to have them in one single json file. — serra, Jul 25 '19 at 12:18
And by the way, the above data is a representation of my real data, so that anyone can copy paste it and have a model of it in the console, it is NOT the real data. The answer given by calestini does exactly what I was looking for and my scraping project is alive and kicking. Sometimes I wonder if people participate in this forum to really help the community or just for the sake of trashing on the others. — serra, Jul 25 '19 at 14:36

realr · Accepted Answer · 2019-07-24T21:15:37.787

You can loop and compare (assuming performance is not an issue). In this case I am creating the result in a list of dict, so you will have 4 final dict, each with the joined DataFrames.

results = []

for d1 in l1:
    l1 = list(d1.keys())[0]
    for d2 in l2:
        if  l1 == list(d2.keys())[0]:
            results.append({list(d1.keys())[0]: d1[l1].join(d2[l1], lsuffix='_suffix') })

results[0]['m']
Out[44]: 
   0_suffix  1_suffix  2_suffix  ...         1         2         3
0 -0.916366  0.340707  0.530867  ... -0.495796 -0.982780  2.354565
1 -0.154919 -0.899602 -0.595742  ...  0.425826  1.220378  1.169143
2  0.187701 -0.933682 -0.035998  ... -0.069469 -1.149548  1.558640
3  0.551400 -0.059035  0.913508  ...  2.001956 -0.130476 -0.896371

If the final output is the original left dataframe from l1, then just join to the initial obj:

for d1 in l1:
    l1 = list(d1.keys())[0]
    for d2 in l2:
        if  l1 == list(d2.keys())[0]:
            d1[l1] = d1[l1].join(d2[l1], lsuffix='_suffix')

d1
Out[46]: 
{'b':    0_suffix  1_suffix  2_suffix  ...         1         2         3
 0  0.369090  1.001815  0.576901  ... -0.886816  0.266766  1.008030
 1  0.585813 -0.725805  0.587021  ... -0.047143  1.319553  0.137310
 2  0.305254  0.187430  0.567565  ...  0.927931  0.923679 -0.932878
 3 -1.648351 -0.605600 -1.614015  ... -0.100864 -0.750169  1.314675

If you meant a concatenation / append:

for d1 in l1:
    l1 = list(d1.keys())[0]
    for d2 in l2:
        if  l1 == list(d2.keys())[0]:
            d1[l1] = pd.concat([d1[l1], d2[l1]],axis=0, sort=False)

d1['b']
Out[50]: 
          0         1         2         3
0  0.394191 -1.049085 -0.105597  1.208647
1 -0.455081  1.093636  0.469112 -0.856876
2  0.645717 -2.101677  0.086884 -1.038289
3 -0.245927  0.756143  0.755207  1.249809
0  0.518625 -0.953318 -0.069464  0.670162
1  1.495597 -1.598482 -0.615926 -1.166068
2 -0.534093  0.786945 -0.546304 -0.322346
3 -0.635425 -2.574919  0.292062  0.856733

Works like a charm, thank you. In fact, I edited the question because I had put indexes instead of column names. Thanks. — serra, Jul 24 '19 at 22:51

M_S_N · Answer 2 · 2019-07-24T21:09:31.793

0

Here is simplest approach you can use

d1['m'].merge(d2['h'], how='outer')

Output:

       0            1           2           3
0   -0.846978   -0.103178   0.764094    -1.116666
1   1.845743    -2.232665   1.078197    1.437294
2   -0.244121   -0.952903   0.625375    0.907227
3   1.217737    -0.684253   -1.661790   -0.071234
4   1.591530    -0.098318   -0.015063   -0.210989
5   0.382137    1.139896    -0.775984   1.022370
6   0.182091    -1.626770   1.267648    0.987247
7   -1.231164   0.129947    1.773350    -0.880221

edited Jul 24 '19 at 21:09

answered Jul 24 '19 at 21:07

M_S_N

2,764
1
17
38

1

I need the code to search for the key by itself, not doing it manually. – serra Jul 24 '19 at 21:09

Join DataFrames which are values in Dicts contained in different Lists of Dicts

2 Answers2