1

I want to shift horizontally each row of my dataframe test_1 following the values in the corresponding rows from another dataframe df_x. The value in each row of df_x should define the amount of steps shifted to the left.

test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])

df_x = pd.DataFrame([[1],[3],[2]])

My expected output would be:

Out[157]: 
    0   1   2   3
0   2   3   4  NA
1  14  NA  NA  NA
2  23  24  NA  NA

I tried adapting from an answer to a similar question (how to shift columns in pandas DataFrame dynamically and independently?), and used this: test_1.apply(lambda x: x.shift(periods = -df_x.loc[x,:]), axis = 1).

However, I keep getting the following error:

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([3, 4], dtype='int64'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"

PS: If anyone also knows how to do this without converting from Xarray to Pandas, that would be even better.

Henrique
  • 135
  • 6

3 Answers3

2

I see that @tlentali answered your question as an alternative answer you can also use this:

df_x_seq = np.squeeze((df_x.values))
Lens=np.squeeze((test_1.shape[1]-df_x).values)
    Id=-1
def shift(row):
    global Id
    Id+=1
    array=np.empty((row.shape[0]))
    array[:]=np.nan
    if Lens[Id]!=row.shape[0]:
        array[:Lens[Id]]=row.values[df_x_seq[Id]:]
        return  array 
    else:
        return row
 

enter image description here

Hakan Akgün
  • 872
  • 5
  • 13
  • Thank you for the reply. The answer seems to work, but when I changed `df_x = pd.DataFrame([[2],[2],[3]])`, I got an error: `ValueError: could not broadcast input array from shape (3,) into shape (1,)` – Henrique Aug 10 '21 at 12:59
  • Thank you! Actually, I've got this issue with the `df_x_seq` not being defined. – Henrique Aug 10 '21 at 13:35
  • Oh it's just squeezed version of df_x. – Hakan Akgün Aug 10 '21 at 13:37
  • Yes, it works perfectly. I accepted the other answer because it came first, but this one is perfect as well. thank you for your time! – Henrique Aug 10 '21 at 13:45
1

From your DataFrame :

>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]])
>>> test_1
    0   1   2   3
0   1   2   3   4
1   10  12  13  14
2   20  22  23  24

We can start by transposing the DataFrame like so :

>>> test_1 = test_1.T
>>> test_1
    0   1   2
0   1   10  20
1   2   12  22
2   3   13  23
3   4   14  24

Then, using df_x we can use a for loop to select the correct columns to shift and re-transpose test_1 again to get the expected result :

>>> df_x = pd.DataFrame([[0],[1],[2]])
>>> line=0
>>> for val in list(df_x.values.flatten()): 
...     test_1[line] = test_1[line].shift(periods=-val)
...     line+=1
>>> test_1 = test_1.T
    0       1       2       3
0   1.0     2.0     3.0     4.0
1   12.0    13.0    14.0    NaN
2   23.0    24.0    NaN     NaN

For df_x = pd.DataFrame([[0],[2],[2]]) :

>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]]) 
>>> test_1 = test_1.T 
>>> df_x = pd.DataFrame([[0],[2],[2]]) 

>>> line=0
>>> for val in list(df_x.values.flatten()): 
...     test_1[line] = test_1[line].shift(periods=-val)
...     line+=1  
>>> test_1 = test_1.T         
>>> test_1
    0       1       2   3
0   1.0     2.0     3.0 4.0
1   13.0    14.0    NaN NaN
2   23.0    24.0    NaN NaN

For df_x = pd.DataFrame([[2],[2],[0]]) :

>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]]) 
>>> test_1 = test_1.T 
>>> df_x = pd.DataFrame([[2],[2],[0]]) 

>>> line=0
>>> for val in list(df_x.values.flatten()): 
...     test_1[line] = test_1[line].shift(periods=-val)
...     line+=1  
>>> test_1 = test_1.T         
>>> test_1
    0       1       2       3
0   3.0     4.0     NaN     NaN
1   13.0    14.0    NaN     NaN
2   20.0    22.0    23.0    24.0

And for df_x = pd.DataFrame([[1],[3],[2]]) :

>>> test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]]) 
>>> test_1 = test_1.T 
>>> df_x = pd.DataFrame([[1],[3],[2]]) 

>>> line=0
>>> for val in list(df_x.values.flatten()): 
...     test_1[line] = test_1[line].shift(periods=-val)
...     line+=1  
>>> test_1 = test_1.T         
>>> test_1
    0       1       2     3
0   2.0     3.0     4.0   NaN
1   14.0    NaN     NaN   NaN
2   23.0    24.0    NaN   NaN
tlentali
  • 3,407
  • 2
  • 14
  • 21
  • Thank you very much for the answer. It worked well, but if I change my `df_x` from `[0],[1],[2]` to `[0],[2],[2]`, the results stop working: test_1 = pd.DataFrame([[1,2,3,4], [10,12,13,14], [20, 22, 23,24]]) test_1 = test_1.T df_x = pd.DataFrame([[0],[2],[2]]) for val in df_x.values: test_1[val[0]] = test_1[val[0]].shift(periods=-val[0]) test_1 = test_1.T test_1 Out[285]: 0 1 2 3 0 1.0 2.0 3.0 4.0 1 10.0 12.0 13.0 14.0 2 NaN NaN NaN NaN – Henrique Aug 10 '21 at 12:54
  • 1
    I updated the answer to cover your second example. Does it solve your question ? – tlentali Aug 10 '21 at 13:24
  • 1
    As you added a new example in your question, I updated my answer. – tlentali Aug 10 '21 at 13:35
0

There was a third answer that was also very good, but got deleted for some reason.

Solution:

test_2=(df_x.rename(columns={0:'s'})
            .join(test_1)
            .apply(lambda x:x.shift(-(x['s'])),axis=1)
            .drop(columns=['s']))
Henrique
  • 135
  • 6