3

I'm trying to create new dataframes using groupby on a multiindex dataframe df. Level 0 is a string identifier, Level 1 is a datetime index. Ultimately I want to determine the total time each vsl is associated with each DIV and DIS. Here's a snippet of df:

                            DIV DIS
vsl    BeginTime            
vsl1   2015-08-19 16:40:00  SAD SAJ  
       2015-08-20 03:45:00  SAD SAJ   
       2015-08-20 13:55:00  SAD SAJ
       ...
vsl2   2015-06-11 07:10:00  NWD NWP
       2015-06-11 16:35:00  NWD NWP
       2015-06-12 01:50:00  NWD NWP
       2015-06-12 11:25:00  NWD NWP
       ...
vsl3   2015-06-24 02:40:00  MVD MVN
       2015-06-24 06:50:00  MVD MVN
       2016-01-21 13:05:00  NAD NAN
       2016-01-21 23:35:00  NAD NAN
       ...
[6594 rows x 2 columns]

I've checked How to iterate over pandas multiindex dataframe using index and came up with this, which doesn't do what I want:

for vsl, new_df in df.groupby(level=0):
    vsl = new_df

I was expecting new dataframes ['vsl1', vsl2', vsl3'], each with the contents of the groupby dataframe, i.e. for vsl1:

                            DIV DIS
vsl    BeginTime            
vsl1   2015-08-19 16:40:00  SAD SAJ  
       2015-08-20 03:45:00  SAD SAJ   
       2015-08-20 13:55:00  SAD SAJ
       ...
[411 rows x 2 columns]

If I call vsl1:

In [102]: vsl1
Traceback (most recent call last):

  File "<ipython-input-102-7a5664be723c>", line 1, in <module>
    vsl1

NameError: name 'vsl1' is not defined

If I call vsl:

In [103]: vsl
Out[103]:
                            DIV DIS
vsl    BeginTime            
vsl3   2015-06-24 02:40:00  MVD MVN
       2015-06-24 06:50:00  MVD MVN
       2016-01-21 13:05:00  NAD NAN
       2016-01-21 23:35:00  NAD NAN
       ...
[412 rows x 2 columns]

I tried printing as demonstrated in the ref post as a test:

In [104]: for vsl, new_df in df.groupby(level=0):
     ...:    print(new_df)
     ...:
Out[104]:
                            DIV DIS
vsl    BeginTime            
vsl1   2015-08-19 16:40:00  SAD SAJ  
       2015-08-20 03:45:00  SAD SAJ   
       2015-08-20 13:55:00  SAD SAJ
       ...
[411 rows x 2 columns]
                            DIV DIS
vsl    BeginTime            
vsl2   2015-06-11 07:10:00  NWD NWP
       2015-06-11 16:35:00  NWD NWP
       2015-06-12 01:50:00  NWD NWP
       2015-06-12 11:25:00  NWD NWP
       ...
[410 rows x 2 columns]
                            DIV DIS
vsl    BeginTime            
vsl3   2015-06-24 02:40:00  MVD MVN
       2015-06-24 06:50:00  MVD MVN
       2016-01-21 13:05:00  NAD NAN
       2016-01-21 23:35:00  NAD NAN
       ...
[412 rows x 2 columns]

What am I missing, and how can create a new dataframe for each vsl contained in level 0?

Community
  • 1
  • 1
user3512166
  • 121
  • 1
  • 7
  • Maybe you can create empty list and then fill it by dataframes in groupby loop. Then you can use indexing: `vsl1=vsls[0]` – jezrael Feb 08 '16 at 16:49
  • `vsls =[]` `for vsl, new_df in df.groupby(level=0): vsls.append(new_df)` – jezrael Feb 08 '16 at 16:56
  • That works. Come to think of it, it's probably easier to call vsls[i] for the range(len(vsls)) than it is to know the unique vsl names (which was what I was going to name each new_df). Thanks! – user3512166 Feb 08 '16 at 17:57
  • Does this answer your question? [How to access pandas groupby dataframe by key](https://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key) – Marine Galantin Jun 10 '20 at 17:20

0 Answers0