1

Here is a sample dataframe:

                     datetime temp  T1  T2  T3  T4  T5
115 2020-01-04 02:53:00+00:00   58   0   0   0   0   0
116 2020-01-04 03:53:00+00:00   51   0   0   0   0   0
117 2020-01-04 04:53:00+00:00   49   0   0   0   0   0
118 2020-01-04 05:53:00+00:00   48   0   0   0   0   0
119 2020-01-04 06:00:00+00:00   48   0   0   0   0   0
120 2020-01-04 06:53:00+00:00   47   0   0   0   0   0

And here is what I want the output to be :

                     datetime temp  T1  T2  T3  T4  T5
115 2020-01-04 02:53:00+00:00   58   0   0   0   0   0
116 2020-01-04 03:53:00+00:00   51   58  0   0   0   0
117 2020-01-04 04:53:00+00:00   49   51  58  0   0   0
118 2020-01-04 05:53:00+00:00   48   49  51  58  0   0
119 2020-01-04 06:00:00+00:00   48   48  49  51  58  0
120 2020-01-04 06:53:00+00:00   47   48  48  49  51  58
ironclock
  • 75
  • 1
  • 5

2 Answers2

4

Use Series.shift

for col in df.columns[df.columns.str.contains('T')]:
    df[col] = df['temp'].shift(int(col[1:]),fill_value = 0)
print(df)

We can also use pd.Index.difference

for col in df.columns.difference(['datetime','temp']):
        df[col] = df['temp'].shift(int(col[1:]),fill_value = 0)

Output

                      datetime  temp  T1  T2  T3  T4  T5
115  2020-01-04-02:53:00+00:00    58   0   0   0   0   0
116  2020-01-04-03:53:00+00:00    51  58   0   0   0   0
117  2020-01-04-04:53:00+00:00    49  51  58   0   0   0
118  2020-01-04-05:53:00+00:00    48  49  51  58   0   0
119  2020-01-04-06:00:00+00:00    48  48  49  51  58   0
120  2020-01-04-06:53:00+00:00    47  48  48  49  51  58
ansev
  • 30,322
  • 5
  • 17
  • 31
  • better to use `df.columns.str.startswith('T')` right? – hongsy Jan 28 '20 at 00:56
  • So Series.shift works but only up to T9. Any further and it starts over again. How can I fix this? Thanks for the dope solution btw! – ironclock Jan 31 '20 at 17:42
  • 1
    use `col[1:]` instead `col1[-1]` – ansev Jan 31 '20 at 17:51
  • 1
    Thanks -- this works but only when using one character as the prefix (such as T), if I want to add an underscore or use 2 characters plus an underscore (eg. T_, TF_) I need to adjust it to col[2:] and col[3:] respectively. I'm sure there's a way to ignore the beginning characters however. – ironclock Jan 31 '20 at 18:19
  • 1
    `col.split('_')[-1]` – ansev Jan 31 '20 at 19:08
2

For using a numpy solution , I didn't find a builtin solution which uses shift by referencing an input list, but we can leverage this excellent answer courtesy @Divakar and use it to get our solution by creating the required arrays from our dataframe:

cols = df.columns[2:]
mat = np.ones((len(df),len(cols))) * df['temp'][:,None]
r = np.arange(1,len(cols)+1)

df[cols]=strided_indexing_roll(mat.T,r).T
print(df)

                    datetime  temp    T1    T2    T3    T4    T5
0  2020-01-04 02:53:00+00:00    58   0.0   0.0   0.0   0.0   0.0
1  2020-01-04 03:53:00+00:00    51  58.0   0.0   0.0   0.0   0.0
2  2020-01-04 04:53:00+00:00    49  51.0  58.0   0.0   0.0   0.0
3  2020-01-04 05:53:00+00:00    48  49.0  51.0  58.0   0.0   0.0
4  2020-01-04 06:00:00+00:00    48  48.0  49.0  51.0  58.0   0.0
5  2020-01-04 06:53:00+00:00    47  48.0  48.0  49.0  51.0  58.0

Note: Change the line p = np.full((a.shape[0],a.shape[1]-1),np.nan) to p = np.full((a.shape[0],a.shape[1]-1),0) in the function.

anky
  • 74,114
  • 11
  • 41
  • 70