Resample pandas dataframe only knowing result measurement count

Question

I have a dataframe which looks like this:

Trial    Measurement    Data
    0              0      12 
                   1       4
                   2      12
    1              0      12
                   1      12
    2              0      12
                   1      12
                   2     NaN
                   3      12

I want to resample my data so that every trial has just two measurements So I want to turn it into something like this:

Trial    Measurement    Data
    0              0       8 
                   1       8
    1              0      12
                   1      12
    2              0      12
                   1      12

This rather uncommon task stems from the fact that my data has an intentional jitter on the part of the stimulus presentation.

I know pandas has a resample function, but I have no idea how to apply it to my second-level index while keeping the data in discrete categories based on the first-level index :(

Also, I wanted to iterate, over my first-level indices, but apparently

for sub_df in np.arange(len(df['Trial'].max()))

Won't work because since 'Trial' is an index pandas can't find it.

How does (0, 1) become 8 and not 12 but (2,1) becomes 6 (taking NaN as zero and not as missing data)? What's your resample rule? The mean of the first half of the data and the mean of the last, allowing overlapping, and setting NaN to 0? — DSM, Nov 20 '13 at 22:07
I have no pre-defined resampling rule I have to stick with. I gave the example because it seemed intuitive. indeed, however, it's better to treat NaN as missing data. the first values become 8 due to overlap, yes. That I still think is the best approach. I have ~256 values per trial, so the one value overlap will in any case barely make a difference. — TheChymera, Nov 20 '13 at 22:10
Do you need to index it like this, it may be easier as just columns and use a groupby — Andy Hayden, Nov 20 '13 at 22:17
no, actually I indexed it this way thinking it would make stuff easier :-/ — TheChymera, Nov 20 '13 at 22:25

score 1 · Answer 1 · answered Nov 20 '13 at 22:29

Well, it's not the prettiest I've ever seen, but from a frame looking like

>>> df
   Trial  Measurement  Data
0      0            0    12
1      0            1     4
2      0            2    12
3      1            0    12
4      1            1    12
5      2            0    12
6      2            1    12
7      2            2   NaN
8      2            3    12

then we can manually build the two "average-like" objects and then use pd.melt to reshape the output:

avg = df.groupby("Trial")["Data"].agg({0: lambda x: x.head((len(x)+1)//2).mean(), 
                                       1: lambda x: x.tail((len(x)+1)//2).mean()}) 
result = pd.melt(avg.reset_index(), "Trial", var_name="Measurement", value_name="Data")
result = result.sort("Trial").set_index(["Trial", "Measurement"])

which produces

>>> result

                   Data
Trial Measurement      
0     0               8
      1               8
1     0              12
      1              12
2     0              12
      1              12

can I make it more general so that it perfoms these operations not just on "Data"? I have 11 data columns actually ^^ — TheChymera, Nov 21 '13 at 00:16

Resample pandas dataframe only knowing result measurement count

1 Answers1

Linked