Pandas Groupby, Filter, and Insert Column

Question

I'm working with a Pandas dataframe that has a column with a unique ID code representing a client. Each ID code is repeated in several rows in the table. There is another column in the table with a boolean flag, true or false. I am trying to adjust the table so that for every ID code, if there is at least one flag set to true, they will all be set to true; i.e. you could have one client ID code in 10 rows, and 9 of the rows have the flag set to false but one is set to true. I want all of the rows to now get set to true. Here is what I tried:

data=[
    {"id":"a","flag":True},
    {"id":"a","flag":True},
    {"id":"a","flag":False},
    {"id":"b","flag":False},
    {"id":"a","flag":True},
    {"id":"a","flag":True}]
df = pd.DataFrame(data)

df.groupby('id').filter(lambda x:(x['flag']==True).any())['mod_flag'] = True```

df[df['mod_flag'] != True] = False

But it is throwing a key error on the second line for mod_flag. Any help would be greatly appreciated--thanks!

EDIT:

Adding in a sample data table here for the desired output:

id	flag	mod_flag
a	False	False
a	False	False
b	False	True
b	False	True
b	True	True
c	True	True
c	True	True

Where the rows with ID = b are the ones that need to be changed.

Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). Show where the intermediate results differ from what you expected. We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. — Prune, Feb 11 '21 at 19:14
Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. — Prune, Feb 11 '21 at 19:14
Yes, but your edits do not address the problems. Please follow the information in the links I provided. — Prune, Feb 11 '21 at 19:42

score 1 · Accepted Answer · answered Feb 11 '21 at 19:48

groupby() to get all related rows together
transform() to get a value for each work
simple pandas series any()

df = pd.DataFrame({"client_id":np.random.randint(1,5,8),
             "flag":np.random.choice([False,True], 8)}).sort_values("client_id")

df.assign(newflag=df.groupby("client_id")["flag"].transform(lambda s: s.any()))

	client_id	flag	newflag
3	1	True	True
6	1	False	True
0	2	True	True
2	2	True	True
7	2	True	True
1	3	True	True
4	3	False	True
5	4	False	False

This worked perfectly--thank you so much! – Lle.4 Feb 11 '21 at 20:04 — Lle.4, Feb 11 '21 at 20:04

score 0 · Answer 2 · answered Feb 11 '21 at 19:44

You could do something like this:

data=[
    {"id":"a","flag":True},
    {"id":"a","flag":True},
    {"id":"a","flag":False},
    {"id":"b","flag":False},
    {"id":"a","flag":True},
    {"id":"a","flag":True}]
df = pd.DataFrame(data)


true_ids = df[df["flag"]]["id"].unique()
df["flag"] = df["id"].isin(true_ids)

Pandas Groupby, Filter, and Insert Column

2 Answers2