2

I have a list of nested dictionaries in python with data that I want to analyze using pandas. Here's some example data:

[
  {
    "A": { "left": 1, "right": 2 },
    "B": { "left": 3, "right": 4 },
    "C": { "left": 5, "right": 6 },
  },
  {
    "A": { "left": 7, "right": 8 },
    "B": { "left": 9, "right": 10 },
    "C": { "left": 11, "right": 12 },
  },
  ...
]

And so on. As the example shows, each item in the array is a dictionary with the same keys, and each key points to a dictionary with the same keys. In table form, I imagine it should look something like this:

|idx|A         |B         |C         |
|   |left|right|left|right|left|right|
--------------------------------------
| 0 |   1|    2|   3|    4|   5|    6|
| 1 |   7|    8|   9|   10|  11|   12|
| 2 | ...                            |

What I want to do is aggregate the min, mean, and max of "left" and "right" across this list of dictionaries for each letter, so that it ends up with a DataFrame like this:

|idx|left_min|left_mean|left_max|right_min|right_mean|right_max|
----------------------------------------------------------------
| A |       1|        4|       7|        2|         5|        8|
| B |       3|        6|       9|        4|         7|       14|
| C |       5|        8|      11|        6|         9|       12|

I'm experienced with python but relatively new to pandas, so I'm trying to get it right within a pandas framework before I just write it myself in python. I've tried many different ways to mold and shape pandas DataFrames into this form but I can't quite manage it. Every attempt I've made has either multi-indexed in strange ways, or can't aggregate properly. I feel like I'm missing something fundamental here. Any help is appreciated.

  • This is multi-indexing in the column axis. Checkout [this](https://stackoverflow.com/questions/18470323/selecting-columns-from-pandas-multiindex) and [this](https://stackoverflow.com/questions/25189575/pandas-dataframe-select-columns-in-multiindex). – MYousefi Feb 08 '22 at 00:53
  • @MYousefi If I can get that list of dictionaries turned into a DataFrame with multi-indexed columns, do you know how to then aggregate the min, mean, and max like I'm looking for? Because I can do that first step, but I wasn't even sure if it was the best first step. If it is, I can include my code for my attempt to do that. – Jason Ericson Feb 08 '22 at 01:03
  • you're brave for attempting this in vanilla python :) I'd have thrown it in a database and chucked some SQL at it. – Umar.H Feb 08 '22 at 01:12
  • 1
    @Umar.H Ha, if I knew any SQL then maybe I would! I'm just a game dev doing some basic analysis and I happen to know python. :) – Jason Ericson Feb 08 '22 at 18:12

1 Answers1

2

Let's tackle this in steps.

pd.json_normalize

df = pd.json_normalize(data)

   A.left  A.right  B.left  B.right  C.left  C.right
0       1        2       3        4       5        6
1       7        8       9       10      11       12

str.split for the multi index.

df.columns = df.columns.str.split('.',expand=True)

     A          B          C
  left right left right left right
0    1     2    3     4    5     6
1    7     8    9    10   11    12

stack and .groupby.agg(['min','max','mean']) for those pesky aggregations.

df1 = df.stack(0).groupby(level=1).agg(['min','max','mean'])   


  left          right
   min max mean   min max mean
A    1   7  4.0     2   8  5.0
B    3   9  6.0     4  10  7.0
C    5  11  8.0     6  12  9.0

Finally a simple list comp to flatten your columns.

df1.columns = [f"{x}_{y}" for x,y in df1.columns]

print(df1)
   left_min  left_max  left_mean  right_min  right_max  right_mean
A         1         7        4.0          2          8         5.0
B         3         9        6.0          4         10         7.0
C         5        11        8.0          6         12         9.0
                        

or more succinctly as @sammywemmy

df1.columns = df1.columns.map('_'.join)
halfer
  • 19,824
  • 17
  • 99
  • 186
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • ``df.columns = df.columns.map('_'.join)``? – sammywemmy Feb 08 '22 at 02:22
  • Amazing! Thank you for the very clean solution here. I think the key pieces I was missing were a) the simple multi-index using a string operation, and b) stack(), which I hadn't heard of and will definitely investigate further. Also I'm still just wrapping my head around multi-indexing in general, and this helps a lot. Thanks! – Jason Ericson Feb 08 '22 at 18:11