Most effecient way to take a difference of each column in dataframe and re-oder once done

Question

Hi I have a dataframe that has several columns in it. Id like to either create a new dataframe or replace the columns in this dataframe between Timestamp to y_pred with the difference of that column but i'd like the final result to maintain the same order. So CLES12Z would be replaced by the diff of the previous CLES12Z row and the current CLES12Z row and that would be carried out for every column up to y_pred.

So far I've tried the following:

columnend = data.columns.get_loc('y_pred')

for e, col in enumerate(data.columns):
    if e < columnend and (e>0):
        print(col)
        data[col+'Diff'] = data[col]-data[col].shift(1)
        data.drop([col],axis=1,inplace=True)

But I'm noticing that will just put all the new columns to the end and Id then have to resort the entire dataframe.

I was wondering if there was a more direct or effecient way to do this?

Your question is a bit confusing, are you trying to _replace_ your existing columns, or _add_ new columns with the differences? Please have a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and add sample input and output to your question (as text, not as pictures) so we can help you better — G. Anderson, Aug 14 '19 at 19:54
@G.Anderson what's the best way to get my dataframe in text on here? I tried to print the head but most of the columns of interest here are left out. And yea I'm trying to have a new dataframe (or the existing one) with the columns replaced by their differences. — novawaly, Aug 15 '19 at 12:31

Ted · Answer 1 · 2019-08-15T13:15:50.100

0

Hopefully this works with your data but I think it's in the right general direction. Of course, we'll lose the first row from the columns to the right of y_pred, as diff() will create Nan values there.

col_end = df.columns.get_loc('y_pred')
col_list = df.columns.tolist()[:col_end]
df = df[col_list].diff().dropna().join(df.iloc[1:, col_end + 1:])
df.reset_index(drop=True, inplace=True)

Edit

If you need the Timestamp column to remain the same:

col_end = df.columns.get_loc('y_pred')
col_list = df.columns.tolist()[:col_end]
df = df[col_list].diff().join(df.loc[:, col_end + 1:])

edited Aug 15 '19 at 13:15

answered Aug 15 '19 at 12:55

Ted

1,189
8
15

ahh man this is so close - no way to keep that first column? (Timestamp) as is? – novawaly Aug 15 '19 at 13:09
@novawaly Does that edit help? Will leave the Timestamp column as it was, but with `Nan` values in the columns affected by `diff`. – Ted Aug 15 '19 at 13:17

Most effecient way to take a difference of each column in dataframe and re-oder once done

1 Answers1