-1

I have a data frame like this

df
col1    col2
  A      1
  B      1
  C      1
  D      4
  E      4
  F      3
  G      6
  H      4
  I      4

Now I want to create another data frame from above with a new column col3 where previous not same value of col2 will be present, so final data frame will look like,

col1    col2    col3
  A      1       1
  B      1       1
  C      1       1
  D      4       1
  E      4       1
  F      3       4
  G      6       3
  H      4       6
  I      4       6

I could do this using a for loop and checking with the next, but the execution time will be more, looking for some pandas shortcuts/ pythonic way to do this most efficiently.

Kallol
  • 2,089
  • 3
  • 18
  • 33

2 Answers2

2

You could try with np.where, using pd.Dataframe.diff(), pd.Dataframe.ffill() and pd.Dataframe.bfill():

df['col3']=pd.Series(np.where(df['col2'].diff().ne(0), df['col2'].shift(),np.nan)).ffill().bfill()

Same as:

df['col3']=df['col2'][df['col2'].diff().ne(0)].shift().reindex(df.index).ffill().bfill()

Output:

df
  col1  col2  col3
0    A     1   1.0
1    B     1   1.0
2    C     1   1.0
3    D     4   1.0
4    E     4   4.0
5    F     3   4.0
6    G     6   3.0
7    H     4   6.0
8    I     4   6.0
MrNobody33
  • 6,413
  • 7
  • 19
1

A little bit logic here , 1st create the group key with cumsum and diff , then take only the first item of each group, and shift the value down , reindex back to original df's index

df['col3']=df.groupby(df.col2.diff().ne(0).cumsum()).col2.head(1).\
              shift().reindex(df.index,method='ffill').fillna(df.col2)
df['col3']
Out[180]: 
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
5    4.0
6    3.0
7    6.0
8    6.0
Name: col2, dtype: float64
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 2
    similar logic just with `loc` can also be: `df.loc[df['col2'].ne(df['col2'].shift()),'col2'].shift().reindex(df.index).ffill().bfill()` – anky Jul 17 '20 at 14:42