How to iterate over pandas multiindex dataframe using index

Question

I have a data frame df which looks like this. Date and Time are 2 multilevel index

                           observation1   observation2
date          Time                             
2012-11-02    9:15:00      79.373668      224
              9:16:00      130.841316     477
2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621
2012-11-04    9:15:00      115.449437     122
              9:16:00      123.776946     555
              9:17:00      153.76646      344
              9:18:00      463.276946     212

I want to run some complex process over daily data block.

Pseudo code would look like

 for count in df(level 0 index) :
     new_df = get only chunk for count
     complex_process(new_df)

So, first of all, I could not find a way to access only blocks for a date

2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621

and then send it for processing. I am doing this in for loop as I am not sure if there is any way to do it without mentioning exact value of level 0 column. I did some basic search and found df.index.get_level_values(0), but it returns all the values and that causes loop to run multiple times for a given day. I want to create a Dataframe per day and send it for processing.

score 152 · Accepted Answer · edited May 20 '22 at 06:25

152

One easy way would be to groupby the first level of the index - iterating over the groupby object will return the group keys and a subframe containing each group.

In [136]: for date, new_df in df.groupby(level=0):
     ...:     print(new_df)
     ...:     
                    observation1  observation2
date       Time                               
2012-11-02 9:15:00     79.373668           224
           9:16:00    130.841316           477

                    observation1  observation2
date       Time                               
2012-11-03 9:15:00     45.312814           835
           9:16:00    123.776946           623
           9:17:00    153.766460           624
           9:18:00    463.276946           626
           9:19:00    663.176934           622
           9:20:00    763.773330           621

                    observation1  observation2
date       Time                               
2012-11-04 9:15:00    115.449437           122
           9:16:00    123.776946           555
           9:17:00    153.766460           344
           9:18:00    463.276946           212

You can also use droplevel to remove the first index (the useless date index):

In [136]: for date, new_df in df.groupby(level=0):
     ...:     print(new_df.droplevel(0))
     ...:
         observation1  observation2
Time
9:15:00     79.373668           224
9:16:00    130.841316           477
...

edited May 20 '22 at 06:25

Jean-Francois T.

11,549
7
68
107

answered Sep 19 '14 at 13:23

chrisb

49,833
8
70
70

1

That helps!. I was following rather roundabout way - first finding index lable and then slicing it using iloc. – Yantraguru Sep 21 '14 at 14:08
1

Where has this been for the last 3 years of my life. Such a clean method thank you! – rgk Sep 05 '20 at 18:04
This method is neat, but how to apply changes from these subframes to the main dataframe? Searching it through loc or iloc makes the computations extremely slow. – Manaslu Sep 23 '20 at 06:56
@Manaslu if you can wrap the changes into a function you can use `df.groupby('key').apply(function)` – bendl Oct 23 '20 at 14:23
Great! Fast and clean way to solve the problem. This is much faster than iterating over dates and getting `df.loc[date]`. – Hunaphu Aug 20 '21 at 12:17
I am facing a similar problem. Can this multiindex dataframe be grouped by observation instead of date? – Soumya C Feb 18 '22 at 21:45
@SoumyaC You can group by "time" with `df.groupby(level=1)` (or `df.groupby(level="Time")` ... or iterate column by column with `df.items()` – Jean-Francois T. May 20 '22 at 06:40

score 8 · Answer 2 · answered Apr 26 '15 at 03:11

8

What about this?

for idate in df.index.get_level_values('date'):
    complex_process(df.ix[idate], idate)

answered Apr 26 '15 at 03:11

psorenson

259
2
5

1

Careful with this solution, note that each value of idate can be hit multiuple times. You should be doing for idate in np.unique(df.index.get_level_values('date')): Note tha additional np.unique. – H. Brandsmeier Jun 02 '16 at 09:02
2

I think `df.index.get_level_values('date').unique()` may better as @melbay pointed out. – Nate Stemen Jan 11 '18 at 19:53

score 5 · Answer 3 · answered Nov 13 '17 at 02:42

5

Tagging off of @psorenson answer, we can get unique level indices and its related data frame slices without numpy as follows:

for date in df.index.get_level_values('date').unique():
    print(df.loc[date])

answered Nov 13 '17 at 02:42

melbay

59
1
1

score 3 · Answer 4 · answered Oct 15 '20 at 13:15

Late to the party, I found that the following works, too:

for date in df.index.unique("date"):
    print(df.loc[date])

It uses the level optional parameter of the Index.unique method introduced in version 0.23.0.

You can specify either the level number or label.

score 2 · Answer 5 · answered Nov 18 '21 at 10:43

2

Another alternative:

for date in df.index.levels[0]:
    print(df.loc[date])

The difference with the df.index.unique("date") proposed by @sanzoghenzo is that it refers to the index level by its number rather than name.

answered Nov 18 '21 at 10:43

Roger Vadim

373
2
12

How to iterate over pandas multiindex dataframe using index

5 Answers5

Linked