How to place successive rows as a list besides Pandas dataframe original rows based on a specific condition?

Question

I have a Pandas dataframe of the following format, which consists of a ColumnA with entries, and an IsMatching Column with True/False values.

       ColumnA                IsMatching
0.     asdasdas                 True
1.     bsdasdas                 False
2.     csdasdas                 False
3.     dsdasdas                 True
4.     4455sdas                 False
5.     asdasdas                 False
6.     ppdasdas                 False
7.     jjdasdas                 True
...

The aim is to create a new dataframe column (NewColumn), which contains ColumnA elements in between the 2 True instances as a list (wherever a True occurs in IsMatching Column, the list needs to be placed there, otherwise, the NewColumn values should be kept empty).

               ColumnA                   NewColumn                                IsMatching
        0.     asdasdas                 [bsdasdas,csdasdas] # As 2 Falses          True
    between dataframe indices 0 and 3.
        1.     bsdasdas                  Keep This Empty                           False
        2.     csdasdas                  Keep This Empty                           False
        3.     dsdasdas                 [4455sdas,asdasdas,ppdasdas] # As 3 Falses 
between dataframe indices 3 and 7.
        4.     esdasdas                ....

Any suggestions and help in achieving this would be much appreciated.

What did you try until now ? This might help : https://stackoverflow.com/questions/64235312/how-to-implodereverse-of-pandas-explode-based-on-a-column — SpaceBurger, Mar 03 '21 at 09:25
@SpaceBurger thanks for the comment and the link to the related question. This is helpful, but the method seems to implode successive rows as a list, without any condition. As I want my list elements to be in between 2 Pandas rows based on conditions, I cannot infer how this `groupby` method can be used with conditions. Thanks. — JChat, Mar 03 '21 at 09:30

score 2 · Accepted Answer · answered Mar 03 '21 at 10:06

add an addition column that are groups of rows that belong together key
generate a list of ColumnA values against False
merge() it back to row in same key where IsMatching is True

import io
df = pd.read_csv(io.StringIO("""       ColumnA                IsMatching
0.     asdasdas                 True
1.     bsdasdas                 False
2.     csdasdas                 False
3.     dsdasdas                 True
4.     4455sdas                 False
5.     asdasdas                 False
6.     ppdasdas                 False
7.     jjdasdas                 True"""), sep="\s+")
df = df.reindex(df.index.astype(int))

df = df.assign(key=df.IsMatching.cumsum())
df = df.merge(df.loc[~df.IsMatching].assign(IsMatching=True).groupby(["IsMatching","key"])["ColumnA"].agg(NewColumn=list), 
         left_on=["IsMatching","key"], right_index=True, how="left").drop(columns="key")

	ColumnA	IsMatching	NewColumn
0	asdasdas	True	['bsdasdas', 'csdasdas']
1	bsdasdas	False	nan
2	csdasdas	False	nan
3	dsdasdas	True	['4455sdas', 'asdasdas', 'ppdasdas']
4	4455sdas	False	nan
5	asdasdas	False	nan
6	ppdasdas	False	nan
7	jjdasdas	True	nan

How to place successive rows as a list besides Pandas dataframe original rows based on a specific condition?

1 Answers1