-1

I'm working with a Pandas dataframe that has a column with a unique ID code representing a client. Each ID code is repeated in several rows in the table. There is another column in the table with a boolean flag, true or false. I am trying to adjust the table so that for every ID code, if there is at least one flag set to true, they will all be set to true; i.e. you could have one client ID code in 10 rows, and 9 of the rows have the flag set to false but one is set to true. I want all of the rows to now get set to true. Here is what I tried:

data=[
    {"id":"a","flag":True},
    {"id":"a","flag":True},
    {"id":"a","flag":False},
    {"id":"b","flag":False},
    {"id":"a","flag":True},
    {"id":"a","flag":True}]
df = pd.DataFrame(data)

df.groupby('id').filter(lambda x:(x['flag']==True).any())['mod_flag'] = True```

df[df['mod_flag'] != True] = False

But it is throwing a key error on the second line for mod_flag. Any help would be greatly appreciated--thanks!

EDIT:

Adding in a sample data table here for the desired output:

id flag mod_flag
a False False
a False False
b False True
b False True
b True True
c True True
c True True

Where the rows with ID = b are the ones that need to be changed.

Lle.4
  • 516
  • 4
  • 17
  • 1
    Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). Show where the intermediate results differ from what you expected. We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. – Prune Feb 11 '21 at 19:14
  • 1
    Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. – Prune Feb 11 '21 at 19:14
  • Sorry about that @Prune, the edits have been made. Thanks! – Lle.4 Feb 11 '21 at 19:41
  • Yes, but your edits do not address the problems. Please follow the information in the links I provided. – Prune Feb 11 '21 at 19:42
  • How about now? @Prune – Lle.4 Feb 12 '21 at 18:35

2 Answers2

1
  • groupby() to get all related rows together
  • transform() to get a value for each work
  • simple pandas series any()
df = pd.DataFrame({"client_id":np.random.randint(1,5,8),
             "flag":np.random.choice([False,True], 8)}).sort_values("client_id")

df.assign(newflag=df.groupby("client_id")["flag"].transform(lambda s: s.any()))

client_id flag newflag
3 1 True True
6 1 False True
0 2 True True
2 2 True True
7 2 True True
1 3 True True
4 3 False True
5 4 False False
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
0

You could do something like this:

data=[
    {"id":"a","flag":True},
    {"id":"a","flag":True},
    {"id":"a","flag":False},
    {"id":"b","flag":False},
    {"id":"a","flag":True},
    {"id":"a","flag":True}]
df = pd.DataFrame(data)


true_ids = df[df["flag"]]["id"].unique()
df["flag"] = df["id"].isin(true_ids)
Ángel Igualada
  • 891
  • 9
  • 13