Find non previous non-same value from pandas column and assign it to the data frame as a pandas column

Question

I have a data frame like this

df
col1    col2
  A      1
  B      1
  C      1
  D      4
  E      4
  F      3
  G      6
  H      4
  I      4

Now I want to create another data frame from above with a new column col3 where previous not same value of col2 will be present, so final data frame will look like,

col1    col2    col3
  A      1       1
  B      1       1
  C      1       1
  D      4       1
  E      4       1
  F      3       4
  G      6       3
  H      4       6
  I      4       6

I could do this using a for loop and checking with the next, but the execution time will be more, looking for some pandas shortcuts/ pythonic way to do this most efficiently.

Please explain this `where previous not same value of col2 will be present` with example(s) — Balaji Ambresh, Jul 17 '20 at 14:11

MrNobody33 · Answer 1 · 2020-07-17T14:56:22.653

You could try with np.where, using pd.Dataframe.diff(), pd.Dataframe.ffill() and pd.Dataframe.bfill():

df['col3']=pd.Series(np.where(df['col2'].diff().ne(0), df['col2'].shift(),np.nan)).ffill().bfill()

Same as:

df['col3']=df['col2'][df['col2'].diff().ne(0)].shift().reindex(df.index).ffill().bfill()

Output:

df
  col1  col2  col3
0    A     1   1.0
1    B     1   1.0
2    C     1   1.0
3    D     4   1.0
4    E     4   4.0
5    F     3   4.0
6    G     6   3.0
7    H     4   6.0
8    I     4   6.0

score 1 · Accepted Answer · answered Jul 17 '20 at 14:15

1

A little bit logic here , 1st create the group key with cumsum and diff , then take only the first item of each group, and shift the value down , reindex back to original df's index

df['col3']=df.groupby(df.col2.diff().ne(0).cumsum()).col2.head(1).\
              shift().reindex(df.index,method='ffill').fillna(df.col2)
df['col3']
Out[180]: 
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
5    4.0
6    3.0
7    6.0
8    6.0
Name: col2, dtype: float64

answered Jul 17 '20 at 14:15

BENY

317,841
20
164
234

2

similar logic just with `loc` can also be: `df.loc[df['col2'].ne(df['col2'].shift()),'col2'].shift().reindex(df.index).ffill().bfill()` – anky Jul 17 '20 at 14:42

Find non previous non-same value from pandas column and assign it to the data frame as a pandas column

2 Answers2