-1

The first column in the dataframe is a random list of studentIDs. I would like to find out if there is any studentID that occur two times. If this is the case, I would like to print out the two lines where it happens.

StudentID   Name
s123456     Michael
s123789     Peter
s123789     Thomas 
s123579     Marie

I would like to print out:

"Two students have the same student id in line {} and {}"
jpp
  • 159,742
  • 34
  • 281
  • 339
  • 3
    And what have you tried so far ? [This question may be downvoted because it shows neither research effort nor attempt](http://idownvotedbecau.se/noattempt/) – vincrichaud Jun 15 '18 at 13:41
  • Possible duplicate of [How do I get a list of all the duplicate items using pandas in python?](https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python) – Derek Brown Jun 15 '18 at 14:23

2 Answers2

1
df = df.reset_index()  # So a row value is visible after the groupby

# Check how the df looks
print(df)
   index StudentID     Name
0      0   s123456  Michael
1      1   s123789    Peter
2      2   s123789   Thomas
3      3   s123579    Marie

def my_func(x):
    count = len(x)
    rows = " and ".join(x.astype(str))
    return "{} students have the same student ID in line {}".format(count, rows)

df = df[df.StudentID.duplicated(False)].groupby('StudentID')['index'].unique().map(my_func)

# Print results
for i in df:
    print(i)

2 students have the same student ID in line 1 and 2
Dillon
  • 997
  • 4
  • 13
0

Here's one way using f-strings, available in Python 3.6+:

# example data
StudentID   Name
s123456     Michael
s123789     Peter
s123789     Thomas 
s123577     Joe
s123456     Mark
s123458     Andrew

# get duplicates StudentIDs
dups = df.loc[df['StudentID'].duplicated(keep=False), 'StudentID'].unique()

# iterate duplicates
for stid in dups:
    dup_idx = df[df['StudentID'] == stid].index.tolist()
    print(f'{len(dup_idx)} Students have the same student id in lines: {dup_idx}')

2 Students have the same student id in lines: [0, 4]
2 Students have the same student id in lines: [1, 2]
jpp
  • 159,742
  • 34
  • 281
  • 339