6

I have a time series dataframe and I would like to reindex it by Trials and Measurements.

Simplified, I have this:

                value
Trial         
    1     0        13
          1         3
          2         4
    2     3       NaN
          4        12
    3     5        34   

Which I want to turn into this:

                  value
Trial    
    1      0        13
           1         3
           2         4
    2      0       NaN
           1        12
    3      0        34

How can I best do this?

TheChymera
  • 17,004
  • 14
  • 56
  • 86

1 Answers1

7

Just yesterday, the illustrious Andy Hayden added this feature to version 0.13 of pandas, which will be released any day now. See here for usage example he added to the docs.

If you are comfortable installing the development version of pandas from source, you can use it now.

df['Measurements'] = df.reset_index().groupby('Trial').cumcount()

The following code is equivalent, if less pithy, and will work on any recent version of pandas.

grouped = df.reset_index().groupby('Trial')
df['Measurements'] = grouped.apply(lambda x: Series(np.arange(len(x)), x.index))

Finally, df.set_index(['Trial', 'Measurements'], inplace=True) to get your desired result.

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
Dan Allan
  • 34,073
  • 6
  • 70
  • 63
  • 4
    haha! "illustrious" +1 ;) (Note: cumcount also works with dupes in index, but "equivalent" doesn't... I was a bit cheeky in the docs and said "*essentially* equivalent" :p) – Andy Hayden Nov 20 '13 at 19:56
  • what if My index isn't called ('measurements') - but rather has no name at all? – TheChymera Nov 20 '13 at 20:06
  • 1
    Unnamed index levels can be specified using the ``level`` keyword, like ``groupby(level=1)``. – Dan Allan Nov 20 '13 at 20:13
  • and how do I select the first subindex of the first index? df.ix[1,0] selects a column :-/ – TheChymera Nov 20 '13 at 20:35
  • also, each trial has ~242 measurements in my dataframe - for some reason the code you pasted above gives me measurement indices from 242 to 484 :/ – TheChymera Nov 20 '13 at 20:48
  • [Read](http://pandas.pydata.org/pandas-docs/dev/indexing.html#advanced-indexing-with-hierarchical-index). – Dan Allan Nov 20 '13 at 21:18
  • yes, this isn't working `df.ix['bar', 'two']`. Or wasn't actually, apparently your code would not work if ['Trial'] was already set as an indey when it was run. Strangely enough, after I run you code, df['Trial'] no longer works :( which is a pity because I wanted to do this in order to better apply the same function to multiple trials (I want to downsample all of the measurements in every trial to two - just two). – TheChymera Nov 20 '13 at 21:22