Subtract pandas dataframe using list to choose diferent columns in each iteration

Question

I'm trying to use a list as an index in a Dataframe subtract operation. However I get the following error: cannot do positional indexing on Index with these indexers

I have these two DataFrames:

df1:

index	t1	t2	t3	t4	t5	...	t950
a,1	0,00001	0,00002	0,00003	0,00004	0,00008	...	0,00004
a,2	0,00001	0,00002	0,00003	0,00005	0,00007	...	0,00004
b,1	0,00004	0,00003	0,00002	0,00006	0,00006	...	0,00001
b,2	0,00005	0,00004	0,00003	0,00007	0,00005	...	0,00002

df2:

index	t1	t2	t3	t4	t5	...	t950
a,1	0,00008	0,00007	0,00007	0,00006	0,00004	...	0,00002
a,2	0,00007	0,00006	0,00005	0,00004	0,00003	...	0,00002
b,1	0,00002	0,00001	0,00002	0,00003	0,00004	...	0,00004
b,2	0,00005	0,00006	0,00007	0,00008	0,00009	...	0,00004

And I have a list too which includes the index for each column from where the subtract should start:

index_col

[2,3,1,2]

My code nowadays is as follows:

result=df1.subtract(df2.iloc[:,index_col:].rename(columns=dict(zip(df2.iloc[:,index_col:].columns,df2.columns))

My expected result is:

index	t1	t2	t3	t4	t5	...	t950
a,1	-0,00006	-0,00004	-0,00001	...	...	...	0,00002
a,2	-0,00003	-0,00001	...	...	...	...	0,00002
b,1	-0,00003	-0,00001	0,00001	-0,00002	...	...	0,00004
b,2	-0,00002	-0,00004	-0,00006	...	...	...	0,00004

Where, for example, in the first row:

t1 - t3 // t2 - t4 // t3 - t5 because in df2 it should start in the third column (as the first index_col value reflects).

Do you know how I can calculate this subtract following my list as a column index ? I know I can do this with a loop but I want to try to avoid it and use the power of vectorization.

Thanks you so much!

Can you provide an example of the operation? It is not clear for me if you want to substract columns / rows ... — Zaero Divide, May 16 '22 at 21:41
@Zaero Divide I just add more information to my question. At the end I want to subtract columns but for each row it should start in a different column based on the list values. — F. Nielsen, May 16 '22 at 21:56
Just thinking out loud. I think I would convert both of these to numpy, then create a version of `df2` where the rows had been rotated by the number of steps you specify. After that, I could do `df1-df2` without any nonsense. — Tim Roberts, May 16 '22 at 22:02
Ah, I see. First row is df1 - df2/offset2. Second row is df1-df2/offset3. I agree with @TimRoberts answer, but vectorized implementation is not easy, see [this question](https://stackoverflow.com/questions/51200369/shift-rows-of-a-numpy-array-independently) — Zaero Divide, May 16 '22 at 22:06

score 2 · Accepted Answer · answered May 16 '22 at 22:22

Assuming you want to subtract 0 when the row runs out.

Let shifts be the list [2,3,1,2], what you call index_col.

Might not be the nicest/most elegant solution, but I think this will do what you want:

import numpy as np
from scipy.ndimage import shift

shifted = np.vstack([shift(row, -k) for row, k in zip(df2.values, shifts)])
result = df1 - shifted

Full example:

import numpy as np
import pandas as pd
from scipy.ndimage import shift


df1 = pd.DataFrame(np.random.random((4, 5)), index=list("ABCD"), columns=list("abcde"))
df2 = pd.DataFrame(np.random.random((4, 5)))
shifts = [2, 3, 1, 2]

>>> df1
          a         b         c         d         e
A  0.308420  0.591043  0.914204  0.407474  0.670670
B  0.371686  0.989710  0.823255  0.145337  0.437014
C  0.023323  0.590014  0.685122  0.558222  0.238016
D  0.996939  0.974608  0.117192  0.539702  0.622569

>>> df2
          0         1         2         3         4
0  0.825861  0.356216  0.668553  0.090403  0.915279
1  0.699088  0.239835  0.728388  0.143411  0.025266
2  0.600219  0.280567  0.267376  0.100538  0.511053
3  0.538059  0.705963  0.633112  0.138550  0.455539

>>> shifted = np.vstack([shift(row, -k) for row, k in zip(df2.values, shifts)])
>>> df1 - shifted
          a         b         c         d         e
A -0.360133  0.500640 -0.001074  0.407474  0.670670
B  0.228275  0.964444  0.823255  0.145337  0.437014
C -0.257244  0.322639  0.584583  0.047168  0.238016
D  0.363827  0.836059 -0.338347  0.539702  0.622569

If you don't want to import scipy, you can write your own shift function: see Shift elements in a numpy array.

Subtract pandas dataframe using list to choose diferent columns in each iteration

1 Answers1