3

I have time series data per row (with columns as time steps) and I'd like to left and right pad each row with 0s based on a conditional row value (i.e. 'Padding amount'). This is what I have:

Padding amount     T1     T2     T3
   0               3      2.9    2.8
   1               2.9    2.8    2.7
   1               2.8    2.3    2.0
   2               4.4    3.3    2.3

And this is what I'd like to produce:

Padding amount     T1     T2     T3     T4     T5
   0               3      2.9    2.8    0      0    (--> padding = 0, so no change)
   1               0      2.9    2.8    2.7    0    (--> shifted one to the left)
   1               0      2.8    2.3    2.0    0
   2               0      0      4.4    3.3    2.3  (--> shifted two to the right)

I see that Keras has sequence padding, but not sure how this would work considering all rows have the same number of entries. I'm looking at Shift and np.roll but I'm sure a solution exists for this already somewhere.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Ellio
  • 117
  • 8
  • it is not exactly a dupe but [this](https://stackoverflow.com/questions/20360675/roll-rows-of-a-matrix-independently) could help you, except you need to create the missing columns first I guess – Ben.T May 20 '20 at 14:25
  • Why do you have a numpy tag? – Mad Physicist May 20 '20 at 15:29

2 Answers2

2

In numpy, you could construct an array of indices for the locations where you want to place your array elements.

Let's say you have

padding = np.array([0, 1, 1, 2])
data = np.array([[3.0, 2.9, 2.8],
                 [2.9, 2.8, 2.7],
                 [2.8, 2.3, 2.0],
                 [4.4, 3.3, 2.3]])
M, N = data.shape

The output array would be

output = np.zeros((M, N + padding.max()))

You can make an index of where the data goes:

rows = np.arange(M)[:, None]
cols = padding[:, None] + np.arange(N)

Since the shape of the index broadcasts to the shape of the shape of the data, you can assign the output directly:

output[rows, cols] = data

Not sure how this applies to a DataFrame exactly, but you could probably construct a new one after operating on the values of the old one. Alternatively, you could probably implement all these operations equivalently directly in pandas.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • if you construct `padding=df['Padding_amount'].to_numpy()` and `data=df.filter(regex='T\d').to_numpy()` where `df` is OP's dataframe, then putting back the `output` is less obvious but can be done with `df[[f'T{i}' for i in range(1, output.shape[1]+1)]] = pd.DataFrame(output, index=df.index)`, anyway your method is really nice – Ben.T May 20 '20 at 18:26
  • 1
    @Ben.T thanks. I'm not really a pandas guy so that's very helpful. – Mad Physicist May 20 '20 at 18:53
1

This is one way of doing it, i've made the process really flexible in terms of how many time periods/steps it can take:

import pandas as pd

#data
d = {'Padding amount': [0, 1, 1, 2],
 'T1': [3, 2.9, 2.8, 4.4],
 'T2': [2.9, 2.7, 2.3, 3.3],
 'T3': [2.8, 2.7, 2.0, 2.3]}
#create DF
df = pd.DataFrame(data = d)
#get max padding amount
maxPadd = df['Padding amount'].max()
#list of time periods
timePeriodsCols = [c for c in df.columns.tolist() if 'T' in c]
#reverse list
reverseList = timePeriodsCols[::-1]
#number of periods
noOfPeriods = len(timePeriodsCols)

#create new needed columns
for i in range(noOfPeriods + 1, noOfPeriods + 1 + maxPadd):
    df['T' + str(i)] = ''

#loop over records
for i, row in df.iterrows():
    #get padding amount
    padAmount = df.at[i, 'Padding amount']
    #if zero then do nothing
    if padAmount == 0:
        continue
    #else: roll column value by padding amount and set old location to zero
    else:
        for col in reverseList:
            df.at[i, df.columns[df.columns.get_loc(col) + padAmount]] = df.at[i, df.columns[df.columns.get_loc(col)]]
            df.at[i, df.columns[df.columns.get_loc(col)]] = 0

print(df)

   Padding amount   T1   T2   T3   T4   T5
0               0  3.0  2.9  2.8          
1               1  0.0  2.9  2.7  2.7     
2               1  0.0  2.8  2.3    2     
3               2  0.0  0.0  4.4  3.3  2.3
Mit
  • 679
  • 6
  • 17