-1

I am dealing with a large medical dataset, and I want to see if there are some rows that corresponds to the same patient. The column that corresponds to the patient's ID is ID Patient.
I want to create a new column where it will be "yes" if that patient appears in more than one row, or "no" if it only appears once.
This is the code that I did:

df['Repeated'] = 'No' # New Column

for i in range(0,len(df)):
    for f in range(0,len(df)):
        if df['ID NP'].iloc[i] == df['ID NP'].iloc[f]:
            df['ID NP'].iloc[i] = 'Yes'
        else:
            df['ID NP'].iloc[i] = 'No'

However this operation is taking too much time. Is there any way to do it faster?

SherylHohman
  • 16,580
  • 17
  • 88
  • 94
bonaqua
  • 101
  • 7

1 Answers1

0

Not only does it take too much time, it also sets all ilocs to "Yes" in all rows.

You only need to count how many times different ilocs can be found. So count them:

stats = {}
for i in df['ID NP'].iloc:
    stats[i] = stats.get(i, 0) + 1

Now you only need to iterate through iloc indices:

for i in range(0, len(df['ID NP'].iloc)):
    id_np = df['ID NP'].iloc[i]
    if stats[id_np] > 1:
        # then there should be 'Yes' in this row
bipll
  • 11,747
  • 1
  • 18
  • 32