1

I'm working on Python 3.x. What is to be achieved is: merge dictionaries based on keys and form a dataframe. This would clear:

What I have:

import numpy as np
import pandas as pd

d1 = {(1, "Autumn"): np.array([2.5, 4.5, 7.5, 9.5]), (1, "Spring"): np.array([10.5, 11.7, 12.3, 15.0])}
d2 = {(1, "Autumn"): np.array([10.2, 13.3, 15.7, 18.8]), (1, "Spring"): np.array([15.6, 20, 23, 27])}

What I want to achieve:

d3 = {(1, "Autumn"): pd.DataFrame([[2.5, 10.2], [4.5, 13.3], [7.5, 15.7], [9.5, 18.8]], 
  columns = ["d1", "d2"]), (1, "Spring"): pd.DataFrame([[10.5, 15.6], [11.7, 20], 
            [12.3, 23], [15.0, 27]], columns = ["d1", "d2"])}

P. S.: I'm actually working on RandomForestRegressor example. The above dictionaries are my X and y values after the train and test data splits. What I'm trying to achieve is to get X, y side-by-side in a dataframe for plots with above query. The size of dictionary is same as are the key and number of values for each key in both dictionaries.

PratikSharma
  • 321
  • 2
  • 17
  • 1
    Shouldn't the `(1,"Autumn")` entry be `[[2.5, 10.2], [4.5, 13.3], [7.5, 15.7], [9.5, 18.8]]`? Also, will `d1` and `d2` always have the same keys (i.e. is it not possible for one dictionary to have a key that does not exist in the other dictionary)? – Joe Patten Jan 04 '19 at 17:50
  • corrected! thanks! no, all keys are present in both dictionaries and are unique ;) – PratikSharma Jan 04 '19 at 18:02
  • have you looked at `pandas.Dataframe.from_dict()`? – Yuca Jan 04 '19 at 18:08

1 Answers1

1

Since all keys are present in both dictionaries (according to your comment), you could iterate through the keys of one dictionary and make a dataframe from each dictionary entry for each key:

d3 = dict()
for k in d1.keys():
    d3[k] = pd.DataFrame(np.array([d1[k],d2[k]]).T, columns=["d1","d2"])

Output:

{(1, 'Autumn'):   
    d1    d2
 0  2.5  10.2
 1  4.5  13.3
 2  7.5  15.7
 3  9.5  18.8, 
(1, 'Spring'):    
    d1    d2
 0  10.5  15.6
 1  11.7  20.0
 2  12.3  23.0
 3  15.0  27.0}
Joe Patten
  • 1,664
  • 1
  • 9
  • 15
  • I had tried `for key, value in zip(d1, d2): d3 = pd.DataFrame([d1[value], d2[value]]).T d3.columns = ["d1", "d2"]` but this is helpful since it allows to add multiple dictionaries and name columns as well. – PratikSharma Jan 04 '19 at 18:13
  • 1
    To get the desired result (of `d3` being a dataframe) you could modify your code to this: `d3 = dict()` `for key, value in zip(d1, d2):` `d3[key] = pd.DataFrame([d1[value], d2[value]]).T` – Joe Patten Jan 04 '19 at 18:18
  • what if my d1 is of shape (91, 2) and d2 is (91,1)? the above solution doesn't seem to work. – PratikSharma Jan 07 '19 at 13:55