0

I have a Pandas dataframe of the following format, which consists of a ColumnA with entries, and an IsMatching Column with True/False values.

       ColumnA                IsMatching
0.     asdasdas                 True
1.     bsdasdas                 False
2.     csdasdas                 False
3.     dsdasdas                 True
4.     4455sdas                 False
5.     asdasdas                 False
6.     ppdasdas                 False
7.     jjdasdas                 True
...

The aim is to create a new dataframe column (NewColumn), which contains ColumnA elements in between the 2 True instances as a list (wherever a True occurs in IsMatching Column, the list needs to be placed there, otherwise, the NewColumn values should be kept empty).

               ColumnA                   NewColumn                                IsMatching
        0.     asdasdas                 [bsdasdas,csdasdas] # As 2 Falses          True
    between dataframe indices 0 and 3.
        1.     bsdasdas                  Keep This Empty                           False
        2.     csdasdas                  Keep This Empty                           False
        3.     dsdasdas                 [4455sdas,asdasdas,ppdasdas] # As 3 Falses 
between dataframe indices 3 and 7.
        4.     esdasdas                ....

       

Any suggestions and help in achieving this would be much appreciated.

JChat
  • 784
  • 2
  • 13
  • 33
  • 1
    What did you try until now ? This might help : https://stackoverflow.com/questions/64235312/how-to-implodereverse-of-pandas-explode-based-on-a-column – SpaceBurger Mar 03 '21 at 09:25
  • @SpaceBurger thanks for the comment and the link to the related question. This is helpful, but the method seems to implode successive rows as a list, without any condition. As I want my list elements to be in between 2 Pandas rows based on conditions, I cannot infer how this `groupby` method can be used with conditions. Thanks. – JChat Mar 03 '21 at 09:30

1 Answers1

2
  • add an addition column that are groups of rows that belong together key
  • generate a list of ColumnA values against False
  • merge() it back to row in same key where IsMatching is True
import io
df = pd.read_csv(io.StringIO("""       ColumnA                IsMatching
0.     asdasdas                 True
1.     bsdasdas                 False
2.     csdasdas                 False
3.     dsdasdas                 True
4.     4455sdas                 False
5.     asdasdas                 False
6.     ppdasdas                 False
7.     jjdasdas                 True"""), sep="\s+")
df = df.reindex(df.index.astype(int))

df = df.assign(key=df.IsMatching.cumsum())
df = df.merge(df.loc[~df.IsMatching].assign(IsMatching=True).groupby(["IsMatching","key"])["ColumnA"].agg(NewColumn=list), 
         left_on=["IsMatching","key"], right_index=True, how="left").drop(columns="key")

ColumnA IsMatching NewColumn
0 asdasdas True ['bsdasdas', 'csdasdas']
1 bsdasdas False nan
2 csdasdas False nan
3 dsdasdas True ['4455sdas', 'asdasdas', 'ppdasdas']
4 4455sdas False nan
5 asdasdas False nan
6 ppdasdas False nan
7 jjdasdas True nan
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30