3

I'm trying to use one column's values to shift another columns values by that amount. Pandas shift(), per the documentation, takes an integer, but is there a way to instead use a Series?

Current Code:

import pandas as pd

df = pd.DataFrame({ 'a':[1,2,3,4,5,6,7,8,9,10],
                    'b':[0,0,0,0,4,4,4,0,0,0]})

df['a'] = df['a'].shift(df['b'])

...which is of course not working.

Desired output:

    a  b
0   1  0
1   2  0
2   3  0
3   4  0
4   1  4
5   2  4
6   3  4
7   8  0
8   9  0
9  10  0

If it makes it easier, the shift will always be the same, so theoretically the 'b' series could be True / False or some other binary trigger, and the .shift() could still be an integer. Feels a little hacky going that route, but it would get the job done.

elPastor
  • 8,435
  • 11
  • 53
  • 81
  • Is it giving you a traceback? Or what is the current output? – Adam Jul 11 '17 at 00:58
  • @Adam, current traceback: `The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().` I didn't really expect it to work, my code was more illustrative. I'm playing with `.loc[]` and `df.index` but still not making headway. – elPastor Jul 11 '17 at 01:00
  • shift simply moves all values in the column by a given period not a select amount. https://stackoverflow.com/questions/10982089/how-to-shift-a-column-in-pandas-dataframe – Adam Jul 11 '17 at 01:04
  • @Adam - I'm not sure what to do with that comment. I think you understand what I'm trying to achieve. In fact, I don't need to use `.shift()`, just need to get the desired output. – elPastor Jul 11 '17 at 01:05
  • I think I do I am just trying to figure out how to word it. lol – Adam Jul 11 '17 at 01:10
  • @Adam - right on. Just figured it out and posted. Thanks for helping. – elPastor Jul 11 '17 at 01:11
  • The desired output you mention is the same as the input. What are you trying to do? – cs95 Jul 11 '17 at 01:19
  • @COLDSPEED - take a closer look at indicies 4-6. – elPastor Jul 11 '17 at 01:20
  • Uhm, so `df['a'] -= df['b']`? – cs95 Jul 11 '17 at 01:21
  • @COLDSPEED - in this particular case that happens to work. But if instead of `[1,2,3...]` it was `['apple', 'banana', 'cherry'...]` – elPastor Jul 11 '17 at 01:22
  • How do you shift strings? NaN? – cs95 Jul 11 '17 at 01:23

2 Answers2

2

we can use numba solution:

from numba import jit

@jit
def dyn_shift(s, step):
    assert len(s) == len(step), "[s] and [step] should have the same length"
    assert isinstance(s, np.ndarray), "[s] should have [numpy.ndarray] dtype"
    assert isinstance(step, np.ndarray), "[step] should have [numpy.ndarray] dtype"
    N = len(s)
    res = np.empty(N, dtype=s.dtype)
    for i in range(N):
        res[i] = s[i-step[i]]
    return res

result:

In [302]: df['new'] = dyn_shift(df['a'].values, df['b'].values)
# NOTE: we should pass Numpy arrays:   ^^^^^^^         ^^^^^^^

In [303]: df
Out[303]:
    a  b  new
0   1  0    1
1   2  0    2
2   3  0    3
3   4  0    4
4   5  4    1
5   6  4    2
6   7  4    3
7   8  0    8
8   9  0    9
9  10  0   10
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • I have no doubt this is a great answer, but now I need to go down the `numba` rabbit hole! Can you explain a little bit about what `numba` and `jit` do? I'm completely unfamiliar. Thanks. – elPastor Jun 25 '18 at 13:09
  • 1
    @pshep123, I don't think I can explain better than [Pandas documentation does](https://pandas.pydata.org/pandas-docs/stable/enhancingperf.html#using-numba) ;-) – MaxU - stand with Ukraine Jun 25 '18 at 14:04
1

Figured it out:

df.loc[df['b'] == 4, 'a'] = df['a'].shift(4)

...this is the 'hacky' version I referred to above. The first 4 is really just a trigger and the second 4 would be hard-coded.

elPastor
  • 8,435
  • 11
  • 53
  • 81
  • But what if column `b` is variable? Taking into account that the shift is prioritized by row number; i.e. the first row has the highest priority in case of conflict and one row needs to replace the other. – otayeby Jul 11 '17 at 01:17
  • @tiba - you're right. If it were more than binary, this would not be the solution. However, per my original post, this solution fits my needs. That said, if you have a solution that handles variables in `'b'`, then by all means, please post and I'll accept it. – elPastor Jul 11 '17 at 01:19
  • @pshep123 If you have strings in `a`, and you try to shift by, say 1. What happens? Now, shift by 4. What happens? Are they NaN in both instances? Are other rows affected by how much `a` is shifted? – cs95 Jul 11 '17 at 01:33
  • @COLDSPEED - there would only be `NaN`s if the shift went outside of the index. If not, it would simply be replaced, so, to simplify `['apple','banana']` would become `['apple','apple']` in the event that the shift column were `[0,1]`. – elPastor Jul 11 '17 at 01:36