Normalizing/Adjusting time series dataframe

Question

I am fairly new to Python and Pandas; been searching for a solution for couple days with no luck... here's the problem:

I have a data set like the below and I need to cull the first few values of some rows so the highest value in each row is in column A. In the below example, rows 0 & 3 would drop the values in column A and row 4 drop the values in column A and B and then shift all remaining values to left.

    A   B   C   D
0   11  23  21  14
1   24  18  17  15
2   22  18  15  13
3   10  13  12  10
4   5   7   14  11

Desired

    A   B   C   D
0   23  21  14  NaN
1   24  18  17  15
2   22  18  15  13
3   13  12  10  NaN
4   14  11  NaN NaN

I've looked at the df.shift(), but don't see how I can get that function to work on a unique row by row basis. Should I instead be using an array and a loop function?

Any help is greatly appreciated.

score 1 · Answer 1 · answered Jul 23 '20 at 01:36

You need to turn all left values of the max to np.nan and use the solution in this question. I use the one from @cs95

df_final = df[df.eq(df.max(1), axis=0).cummax(1)].apply(lambda x: sorted(x, key=pd.isnull), 1)

      A     B     C     D
0  23.0  21.0  14.0   NaN
1  24.0  18.0  17.0  15.0
2  22.0  18.0  15.0  13.0
3  13.0  12.0  10.0   NaN
4  14.0  11.0   NaN   NaN

NALLAPANENIVENKATESH CHOWDARY · Answer 2 · 2020-07-24T08:02:23.390

0

for i in range(df.shape[0]):
  arr = list(df.iloc[i,:])
  c = 0
  while True:
    if arr[0] != max(arr):
     arr.remove(arr[0])
     c += 1
   else:
     break
  nan = ["NaN"]*c
  arr.extend(nan)
  df.iloc[i,:] = arr
print(df)

I have looped over every row and found out max value and remove values before the max and padding "NaN" values at the end to match the number of columns for every row.

edited Jul 24 '20 at 08:02

answered Jul 23 '20 at 01:35

NALLAPANENIVENKATESH CHOWDARY

594
4
9

score 0 · Accepted Answer · answered Jul 23 '20 at 01:36

0

You can loop over the unique shifts (fewer of these than rows) with a groupby and join the results back:

import pandas as pd

shifts = df.to_numpy().argmax(1)
pd.concat([gp.shift(-i, axis=1) for i, gp in df.groupby(shifts)]).sort_index()

      A     B     C     D
0  23.0  21.0  14.0   NaN
1  24.0  18.0  17.0  15.0
2  22.0  18.0  15.0  13.0
3  13.0  12.0  10.0   NaN
4  14.0  11.0   NaN   NaN

answered Jul 23 '20 at 01:36

ALollz

57,915
7
66
89

OMG! I knew there had to be a simple solution. Thank you soooo much! – ghowe Jul 23 '20 at 03:57

score 0 · Answer 4 · answered Jul 23 '20 at 01:55

One approach is to convert each row of the data frame to a list (excluding the index) and append NaN values. Then keep N elements, starting with the max value.

ncols = len(df.columns)
nans = [np.nan] * ncols

new_rows = list()

for row in df.itertuples():
    # convert each row of the data frame to a list
    # start at 1 to exclude the index;
    # and append list of NaNs
    new_list = list(row[1:]) + nans
    
    # find index of max value (exluding NaNs we appended)
    k = np.argmax(new_list[:ncols])
    
    # collect `new row`, starting at max element
    new_rows.append(new_list[k : k+ncols])

# create new data frame
df_new = pd.DataFrame(new_rows, columns=df.columns)
df_new

Normalizing/Adjusting time series dataframe

4 Answers4