I'm working with a Pandas dataframe that has a column with a unique ID code representing a client. Each ID code is repeated in several rows in the table. There is another column in the table with a boolean flag, true or false. I am trying to adjust the table so that for every ID code, if there is at least one flag set to true, they will all be set to true; i.e. you could have one client ID code in 10 rows, and 9 of the rows have the flag set to false but one is set to true. I want all of the rows to now get set to true. Here is what I tried:
data=[
{"id":"a","flag":True},
{"id":"a","flag":True},
{"id":"a","flag":False},
{"id":"b","flag":False},
{"id":"a","flag":True},
{"id":"a","flag":True}]
df = pd.DataFrame(data)
df.groupby('id').filter(lambda x:(x['flag']==True).any())['mod_flag'] = True```
df[df['mod_flag'] != True] = False
But it is throwing a key error on the second line for mod_flag
. Any help would be greatly appreciated--thanks!
EDIT:
Adding in a sample data table here for the desired output:
id | flag | mod_flag |
---|---|---|
a | False | False |
a | False | False |
b | False | True |
b | False | True |
b | True | True |
c | True | True |
c | True | True |
Where the rows with ID = b are the ones that need to be changed.