Converting lists of lists of tuples to MultiIndex pandas dataframe

Question

I have a structure like this:

[
  [
    ('2019-12-01', '0.03555', '0.03', '0.03', '0.03'), 
    ('2019-12-02', '0.03', '0.03', '1', '0.03')
  ],
  [
    ('2019-12-01', '0.111', '0.02', '0.03', '0.03'), 
    ('2019-12-02', '0.03', '0.03', '0.03', '0.03')
  ]
]

I would like each list entry to be an index in a pandas dataframe, with the tuples being rows in the df. Something like this:

                         LIST_1                      LIST_2
         date      p1    p2     p3    p4    |   p1    p2     p3    p4
0   2019-12-01  0.03555  0.03  0.03   0.03  | 0.03  0.03  0.03   0.03
1   2019-12-02     0.03  0.03     1   0.03  | 0.03  0.03  0.03   0.03

I know this is messy, to be honest, I'm unsure the best way to structure it in Pandas as I'm new to it, so any advice would be appreciated.

I have tried to flatten the strucutre using:

d = pd.DataFrame([t for lst in a for t in lst])

But then I just end up with a df as expect like this:

        0          1     2     3      4
0   2019-12-01  0.03555  0.03  0.03   0.03
1   2019-12-02     0.03  0.03     1   0.03
2   2019-12-01    0.111  0.02  0.03   0.03
3   2019-12-02     0.03  0.03  0.03   0.03

But this isn't suitable

jezrael · Accepted Answer · 2020-02-13T14:15:50.787

1

Use list comprehension for create first level of MultiIndex by range with length of list lst with f-strings.

Then use main list comprehension by all values of list with convert inner list to DateFrames, create index by first column by DataFrame.set_index, then rename columns by DataFrame.add_prefix.

Last join all list of DataFrames by concat with keys parameter for first level of MultiIndex and remove index name 0 by DataFrame.rename_axis:

L = [f'LIST_{i}' for i in range(1, len(lst)+1)]
df = (pd.concat([pd.DataFrame(x).set_index(0).add_prefix('p') for x in lst], axis=1, keys=L)
        .rename_axis(None))
print (df)
             LIST_1                   LIST_2                  
                 p1    p2    p3    p4     p1    p2    p3    p4
2019-12-01  0.03555  0.03  0.03  0.03  0.111  0.02  0.03  0.03
2019-12-02     0.03  0.03     1  0.03   0.03  0.03  0.03  0.03

edited Feb 13 '20 at 14:15

answered Feb 13 '20 at 12:44

jezrael

822,522
95
1,334
1,252

1

Thank you @jezrael :D - are you able to just edit the post and describe what each section is doing? will help me consolidate my understanding rather than just ctrl-c and ctrl-v - also could you show me an example of how to query such a structure – Bob Feb 13 '20 at 14:07
@Bob - I think for selecting you can check [this](https://stackoverflow.com/questions/53927460/select-rows-in-pandas-multiindex-dataframe/53927461#53927461) – jezrael Feb 13 '20 at 14:19
1

Thanks for the edit and will take a look at that post. – Bob Feb 13 '20 at 14:48
Whats the best way for me to add an index to the first col on the MultiIndex df – Bob Feb 13 '20 at 15:12
1

@Bob - hmmm, I think it is possible by `df = df.reset_index()` – jezrael Feb 13 '20 at 15:15
Yes I just got it as you replied `.rename_axis(NAME)` works :D – Bob Feb 13 '20 at 15:17
do you have any idea how I would group_by and sum the multiIndex ? – Bob Feb 13 '20 at 15:43

Converting lists of lists of tuples to MultiIndex pandas dataframe

1 Answers1