I'm trying to create new dataframes using groupby on a multiindex dataframe df. Level 0 is a string identifier, Level 1 is a datetime index. Ultimately I want to determine the total time each vsl is associated with each DIV and DIS. Here's a snippet of df:
DIV DIS
vsl BeginTime
vsl1 2015-08-19 16:40:00 SAD SAJ
2015-08-20 03:45:00 SAD SAJ
2015-08-20 13:55:00 SAD SAJ
...
vsl2 2015-06-11 07:10:00 NWD NWP
2015-06-11 16:35:00 NWD NWP
2015-06-12 01:50:00 NWD NWP
2015-06-12 11:25:00 NWD NWP
...
vsl3 2015-06-24 02:40:00 MVD MVN
2015-06-24 06:50:00 MVD MVN
2016-01-21 13:05:00 NAD NAN
2016-01-21 23:35:00 NAD NAN
...
[6594 rows x 2 columns]
I've checked How to iterate over pandas multiindex dataframe using index and came up with this, which doesn't do what I want:
for vsl, new_df in df.groupby(level=0):
vsl = new_df
I was expecting new dataframes ['vsl1', vsl2', vsl3'], each with the contents of the groupby dataframe, i.e. for vsl1:
DIV DIS
vsl BeginTime
vsl1 2015-08-19 16:40:00 SAD SAJ
2015-08-20 03:45:00 SAD SAJ
2015-08-20 13:55:00 SAD SAJ
...
[411 rows x 2 columns]
If I call vsl1:
In [102]: vsl1
Traceback (most recent call last):
File "<ipython-input-102-7a5664be723c>", line 1, in <module>
vsl1
NameError: name 'vsl1' is not defined
If I call vsl:
In [103]: vsl
Out[103]:
DIV DIS
vsl BeginTime
vsl3 2015-06-24 02:40:00 MVD MVN
2015-06-24 06:50:00 MVD MVN
2016-01-21 13:05:00 NAD NAN
2016-01-21 23:35:00 NAD NAN
...
[412 rows x 2 columns]
I tried printing as demonstrated in the ref post as a test:
In [104]: for vsl, new_df in df.groupby(level=0):
...: print(new_df)
...:
Out[104]:
DIV DIS
vsl BeginTime
vsl1 2015-08-19 16:40:00 SAD SAJ
2015-08-20 03:45:00 SAD SAJ
2015-08-20 13:55:00 SAD SAJ
...
[411 rows x 2 columns]
DIV DIS
vsl BeginTime
vsl2 2015-06-11 07:10:00 NWD NWP
2015-06-11 16:35:00 NWD NWP
2015-06-12 01:50:00 NWD NWP
2015-06-12 11:25:00 NWD NWP
...
[410 rows x 2 columns]
DIV DIS
vsl BeginTime
vsl3 2015-06-24 02:40:00 MVD MVN
2015-06-24 06:50:00 MVD MVN
2016-01-21 13:05:00 NAD NAN
2016-01-21 23:35:00 NAD NAN
...
[412 rows x 2 columns]
What am I missing, and how can create a new dataframe for each vsl contained in level 0?