16

Hello I want to store a dataframe in another dataframe cell. I have a data that looks like this enter image description here

I have daily data which consists of date, steps, and calories. In addition, I have minute by minute HR data of a specific date. Obviously it would be easy to put the minute by minute data in 2 dimensional list but I'm fearing that would be harder to analyze later.
What would be the best practice when I want to have both data in one dataframe? Is it even possible to even nest dataframes?
Any better ideas ? Thanks!

Satsuki
  • 2,166
  • 5
  • 17
  • 33
  • you might want to use [`zarr`](https://zarr.readthedocs.io/en/stable/) or [`xarray`](http://xarray.pydata.org/en/stable/) instead of `pandas`. it provides N-dimensional arrays and dataframes and it seems to me that is what you need. – moshevi Jul 24 '18 at 18:42
  • R is able to do this very well, JSYK, It is a little harder with pandas because you can't store the data frame in a data frame. – Demetri Pananos Jul 24 '18 at 18:43
  • Thank you for your comment, currently pandas is only my option – Satsuki Jul 24 '18 at 18:45
  • 1
    @DemetriP, why not? See my answer below – sacuL Jul 25 '18 at 01:47
  • @sacul Oh! I tried a naive way of doing this, but I seem to be mistaken. – Demetri Pananos Jul 25 '18 at 15:22
  • See also [this answer](https://stackoverflow.com/a/53218939/2641825) about the opposite operation: unnest. An equivalent to R's unnest function is no now available in pandas and it's called "explode". – Paul Rougieux Nov 27 '19 at 10:25

1 Answers1

26

Yes, it seems possible to nest dataframes but I would recommend instead rethinking how you want to structure your data, which depends on your application or the analyses you want to run on it after.

How to "nest" dataframes into another dataframe

Your dataframe containing your nested "sub-dataframes" won't be displayed very nicely. However, just to show that it is possible to nest your dataframes, take a look at this mini-example:

Here we have 3 random dataframes:

>>> df1
          0         1         2
0  0.614679  0.401098  0.379667
1  0.459064  0.328259  0.592180
2  0.916509  0.717322  0.319057
>>> df2
          0         1         2
0  0.090917  0.457668  0.598548
1  0.748639  0.729935  0.680409
2  0.301244  0.024004  0.361283
>>> df3
          0         1         2
0  0.200375  0.059798  0.665323
1  0.086708  0.320635  0.594862
2  0.299289  0.014134  0.085295

We can make a main dataframe that includes these dataframes as values in individual "cells":

df = pd.DataFrame({'idx':[1,2,3], 'dfs':[df1, df2, df3]})

We can then access these nested datframes as we would access any value in any other dataframe:

>>> df['dfs'].iloc[0]
          0         1         2
0  0.614679  0.401098  0.379667
1  0.459064  0.328259  0.592180
2  0.916509  0.717322  0.319057
sacuL
  • 49,704
  • 8
  • 81
  • 106