-1

I was trying to write a script that gives me an alert if a username is being duplicated if he has a level of high depression here is the data samples

| A header | Another header |
| -------- | -------------- |
| First    | row            |
| Second   | row            |

and here is my code:

import pandas as pd

df = pd.read_csv("Path")

username_grp = df.groupby(['username'])
filt = df['username'] == 'ali'

print(username_grp.get_group("ali"))
print(username_grp['level'].value_counts()) 
print(username_grp['level'].value_counts().loc['ali'])
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
Ali.M.Kamel
  • 223
  • 3
  • 14
  • Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. – Prune Aug 12 '21 at 21:44

2 Answers2

1

Use value_counts:

>>> df[df['level'] == 'high'].value_counts('username').gt(0).index.tolist()
['ali']
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Thank you for sharing, but what if i have multiple username that is being duplicated in the same sheet ? – Ali.M.Kamel Aug 12 '21 at 22:50
  • 1
    You will see the full list of duplicated username that have a high level of depression. – Corralien Aug 12 '21 at 22:54
  • it's working fine except that .gt doesn't work since i'm trying to retrieve only dublicated name more than 2 that has high level of depression, – Ali.M.Kamel Aug 14 '21 at 09:49
0

Thanks to @Corralien line of code, I came out with this solution, this solution simply prints out the username that is being duplicated in the excel file and if the row is duplicated for more than 3 times with all high level on another column then it appends it to a list, please comment out here if you have a better solution !

import pandas as pd


df = pd.read_csv("PATH")
username_len = df['username']


Medical_alert_list = []
for i in range(len(username_len)):
    try:
       Username = df[df['level'] == 'high']['username'][i]
       if df[df['level'] == 'high'].value_counts('username')["{}".format(Username)] >= 3:
          duplicates = df[df['level'] == 'high']['username'][i]
          Medical_alert_list.append(duplicates)
       else:
         pass
    except:
       pass

final_new_menu = list(dict.fromkeys(Medical_alert_list))
alert = "\033[31m[!]\033[0m "

for i in final_new_menu:
   print("{}{}".format(alert,i))
Ali.M.Kamel
  • 223
  • 3
  • 14