shifting columns in dataframe and adding zeros at the start

Question

I have a data frame with more than 700 columns. I have an array whose length is equal to the rows of data frame so each array value corresponds to one row of data frame. the value in the array is basically a random column from the data frame. This is what I am looking for:

I want to shift all the columns mentioned in the array to the 100th column in my data frame. To do so, I can add zero columns to the start depending on how much shift I need to make. All the values in the array are less than 100, so I know I will have to add zero columns for all rows.

for instance if this is the data frame

 DF1
col1   col2  col3  col4  col5 col6...  col100....col700
21      321   52    74     74   55 .....  20 .... 447

array[0] = 6 so I will have to shift 96 columns so I will add 96 zeros to the start. and the new dataframe will be:

modified_DF1
    col1   col2  col3  col4  col5 col6... col 98    col99  col100....col796
     0      0      0    0     0    0 .....  52          75        55 .... 447

I have tried following code but it does not seem to be working:

def shift_r_peak(df, shift_array):
for i in range(df.shape[0]):
    shift_val = 100 - shift_array[i]
    new_row = np.zeros(df.shape[1])  # create a new row with zeros
    new_row[shift_val:shift_val+shift_array[i]+1] = df.iloc[i, :shift_array[i]+1].values  # insert the values from the original row at the appropriate position
    df.iloc[i] = new_row  # assign the new row to the original DataFrame
return df

Do each of the values in the array specify a shift for each individual row? So if `array = [20, 55]` you shift row 1 by 20 and row 2 by 55? — Nick, May 09 '23 at 02:48
Are the columns actually named `col1, col2, col3` etc. or are those just placeholders? — Nick, May 09 '23 at 02:49
@Nick apologies. for first comment teh columns should shift by 94 not 96. second comment ans: no its not that. each value represents the column value which is to be shifted. so if array = [20, 55], that means for row 1 column 20 should shift to 100 and for row 2 column 55 should shift to 100. and no columns are just named 1,2....700 . the ones i have mentioned are just place holders — Kathan Vyas, May 09 '23 at 03:52
So presumably you fill the end of the row with 0's as well to make all rows the same length? — Nick, May 09 '23 at 03:59

score 1 · Answer 1 · answered May 09 '23 at 04:52

Here is a possible solution using a function (with apply) to shift each row of the dataframe:

# sample data
df = pd.DataFrame(np.indices((6, 5))[0][1:])
#    0  1  2  3  4
# 0  1  1  1  1  1
# 1  2  2  2  2  2
# 2  3  3  3  3  3
# 3  4  4  4  4  4
# 4  5  5  5  5  5

shifts = list(range(5))
# [0, 1, 2, 3, 4]

def shift_r_peak(df, shift_array, max_shift=100):
    def shift_row(row, shift, max_shift):
        return [0] * (max_shift - shift) + list(row) + [0] * shift
    return pd.DataFrame(df.apply(lambda r:shift_row(r, shift_array[r.name], max_shift), axis=1).values.tolist())

out = shift_r_peak(df, shifts, 5)

Output:

   0  1  2  3  4  5  6  7  8  9
0  0  0  0  0  0  1  1  1  1  1
1  0  0  0  0  2  2  2  2  2  0
2  0  0  0  3  3  3  3  3  0  0
3  0  0  4  4  4  4  4  0  0  0
4  0  5  5  5  5  5  0  0  0  0

score 1 · Answer 2 · answered May 09 '23 at 06:38

An efficient solution might be to use the underlying numpy array.

First pad the array with zeros on the right, then apply a row-independent rolling:

# pip install scikit-image
from skimage.util.shape import view_as_windows as viewW

# same DataFrame example as that of @Nick
df = pd.DataFrame(np.indices((6, 5))[0][1:])
shifts = np.array([0,2,1,5,4])

# https://stackoverflow.com/a/51613442/16343464
def strided_indexing_roll(a, r):
    # Concatenate with sliced to cover all rolls
    a_ext = np.concatenate((a,a[:,:-1]),axis=1)

    # Get sliding windows; use advanced-indexing to select appropriate ones
    n = a.shape[1]
    return viewW(a_ext,(1,n))[np.arange(len(r)), (n-r)%n,0]

max_shift = 5 # this would be 100 in your example
r = max_shift-shifts

out = pd.DataFrame(strided_indexing_roll(np.pad(df.to_numpy(),
                                                ((0, 0), (0, r.max()))),
                                         r),
                   index=df.index)

Alternatively, padding on the left and using a negative shift:

out = pd.DataFrame(strided_indexing_roll(np.pad(df.to_numpy(),
                                                ((0, 0), (max_shift-shifts.min(), 0))),
                                         -shifts),
                   index=df.index)

Output:

   0  1  2  3  4  5  6  7  8  9
0  0  0  0  0  0  1  1  1  1  1
1  0  0  0  2  2  2  2  2  0  0
2  0  0  0  0  3  3  3  3  3  0
3  4  4  4  4  4  0  0  0  0  0
4  0  5  5  5  5  5  0  0  0  0

It'd be interesting to know how this compares performance wise. I don't have scikit installed so can't test... — Nick, May 09 '23 at 10:04

shifting columns in dataframe and adding zeros at the start

2 Answers2