3

I have an iterative process that runs with different parameter values each iteration and I want to collect the parameter values and results and put them in a Pandas dataframe with a multi-index built from the sets of parameter values (which are unique).

Each iteration, the parameter values are in a dictionary like this say:

params = {'p': 2, 'q': 7}

So it is easy to collect them in a list along with the results:

results_index = [
    {'p': 2, 'q': 7},
    {'p': 2, 'q': 5},
    {'p': 1, 'q': 4},
    {'p': 2, 'q': 4}
]
results_data = [
    {'A': 0.18, 'B': 0.18},
    {'A': 0.67, 'B': 0.21},
    {'A': 0.96, 'B': 0.45},
    {'A': 0.58, 'B': 0.66}
]

But I can't find an easy way to produce the desired multi-index from results_index.

I tried this:

df = pd.DataFrame(results_data, index=results_index)

But it produces this:

                     A     B
{'p': 2, 'q': 7}  0.18  0.18
{'p': 2, 'q': 5}  0.67  0.21
{'p': 1, 'q': 4}  0.96  0.45
{'p': 2, 'q': 4}  0.58  0.66

(The index did not convert into a MultiIndex)

What I want is this:

        A     B
p q            
2 7  0.18  0.18
  5  0.67  0.21
1 4  0.96  0.45
2 4  0.58  0.66

This works, but there must be an easier way:

df = pd.concat([pd.DataFrame(results_index), pd.DataFrame(results_data)], axis=1).set_index(['p', 'q'])

UPDATE:

Also, this works but makes me nervous because how can I be sure the parameter values are aligned with the level names?

index = pd.MultiIndex.from_tuples([tuple(i.values()) for i in results_index], 
                                  names=results_index[0].keys())
df = pd.DataFrame(results_data, index=index)

        A     B
p q            
2 7  0.18  0.18
  5  0.67  0.21
1 4  0.96  0.45
2 4  0.58  0.66
Bill
  • 10,323
  • 10
  • 62
  • 85
  • 1
    `pd.DataFrame({**x, **y} for x,y in zip(results_index, results_data)).set_index(['p', 'q'])` works, but honestly not too different from your `concat` solution. – ALollz Jan 17 '19 at 02:46

4 Answers4

3

I ran into this recently and it seems there's a slightly cleaner way than the accepted answer:

results_index = [
    {'p': 2, 'q': 7},
    {'p': 2, 'q': 5},
    {'p': 1, 'q': 4},
    {'p': 2, 'q': 4}
]

results_data = [
    {'A': 0.18, 'B': 0.18},
    {'A': 0.67, 'B': 0.21},
    {'A': 0.96, 'B': 0.45},
    {'A': 0.58, 'B': 0.66}
]

index = pd.MultiIndex.from_frame(pd.DataFrame(results_index))

pd.DataFrame(results_data, index=index)

Outputs:

        A     B
p q            
2 7  0.18  0.18
  5  0.67  0.21
1 4  0.96  0.45
2 4  0.58  0.66
santon
  • 4,395
  • 1
  • 24
  • 43
2

Create dictionary of lists and pass to MultiIndex.from_arrays:

#https://stackoverflow.com/a/33046935
d = {k: [dic[k] for dic in results_index] for k in results_index[0]}
print(d)
{'p': [2, 2, 1, 2], 'q': [7, 5, 4, 4]}

mux = pd.MultiIndex.from_arrays(list(d.values()), names=list(d))

df = pd.DataFrame(results_data, index=mux)
print (df)
        A     B
p q            
2 7  0.18  0.18
  5  0.67  0.21
1 4  0.96  0.45
2 4  0.58  0.66
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Yes I think since there appears to be no other way, the best thing is to collect all the parameter values in lists in the first place (checking each iteration that the parameters are the same and get appended to the right list) and then using the `pd.MultiIndex.from_arrays` at the end. Doesn't seem like there is an easy way to make a multi-index from a list of dicts. Thanks. – Bill Jan 17 '19 at 17:33
  • See new answer from @santon using `pd.MultiIndex.from_frame`. – Bill Sep 10 '19 at 19:02
  • @Bill Yes, I see it. – jezrael Sep 10 '19 at 19:45
  • 1
    Alternative with `pd.MultiIndex.from_tuples`: `tuples = [tuple(d.values()) for d in results_index]; index = pd.MultiIndex.from_tuples(tuples, names=list(result_index.keys())); df = pd.DataFrame(results_data, index=index)` – Anakhand Aug 11 '20 at 10:15
1

I tried with .join()

df1 = pd.DataFrame(results_index)
df2 = pd.DataFrame(results_data)
result = df1.join(df2, how='outer').set_index(['p','q'])

I got same results and found this easier. Hope this helps you.

CAppajigowda
  • 458
  • 2
  • 9
0

This is a variation on @jezrael's answer. Slightly more concise and has the benefit of being able to deal with potential inconsistency in parameter dictionaries. But not quite as fast.

index_df = pd.DataFrame(results_index)
index = pd.MultiIndex.from_arrays(index_df.values.transpose(),
                                  names=index_df.columns)
pd.DataFrame(results_data, index=index)

Output:

        A     B
p q            
2 7  0.18  0.18
  5  0.67  0.21
1 4  0.96  0.45
2 4  0.58  0.66
Bill
  • 10,323
  • 10
  • 62
  • 85