3

Given the following data:

data = {'a' : [1,1,1,8,8,3,3,3,3,4,4] }
df = pd.DataFrame(data)

I would now like to shift the whole thing down by n groups, so that their current order is preserved. The desired output for a shift of n=1 would be:

desired_output = {'a': [NaN,NaN,NaN,1,1,8,8,8,8,3,3] }
desired_output_df = pd.DataFrame(desired_output)

a shift of n=2 should be:

desired_output = {'a': [NaN,NaN,NaN,NaN,NaN,1,1,1,1,8,8] }
desired_output_df = pd.DataFrame(desired_output)

I have been messing around with groupby/transform/apply but haven't gotten anything to work so far. If I groupby and then shift, it shifts each group giving the output of:

NOT_desired_output = {'a' : [NaN, 1, 1, NaN, 8, NaN, 3,3,3, NaN, 4]}

I could brute force it by iterating, but I'm sure there's a better solution. Any ideas?

Alex Riley
  • 169,130
  • 45
  • 262
  • 238

1 Answers1

2

This is an interesting operation. I can think of an alternative way to do it with replace.

To shift by 1 group:

>>> df['b'] = df.a.shift()
>>> x = df[df.a != df.b]
>>> df.replace(*x.values.T)

Which gives the DataFrame:

     a   b
0  NaN NaN
1  NaN NaN
2  NaN NaN
3    1 NaN
4    1   1
5    8   1
6    8   8
7    8   8
8    8   8
9    3   8
10   3   3

And we just want column a of this DataFrame:

desired_output_df = pd.DataFrame(_, columns=['a'])

To shift by more than one group, you just need to shift column b of x. If you want to shift by n groups, you need to shift x.b an additional n-1 times. Just insert the line

>>> x.b = x.b.shift(n-1)

after x = df[df.a != df.b] and then perform the df.replace(*x.values.T) step.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
  • Interesting-- I'm trying to see if this works for shifting more than once and updating the question to include that – user3738579 Dec 04 '14 at 22:35
  • ah great- i'm still trying to understand exactly how it works, but it clearly does work! – user3738579 Dec 04 '14 at 22:42
  • `(*x.values.T)` is tuple unpacking of a NumPy array by column (see [this](http://stackoverflow.com/a/27047043/3923281) answer for an explanation). It essentially uses columns `a` and `b` for the appropriate find/replace lists. – Alex Riley Dec 04 '14 at 22:46