I have a dataframe which has 10k plus rows. I need to drop a row if a name repeats itself within a row
Second row is deleted as "Chris" appears twice in the same row. I am reasonably new to programming and am not sure where to even begin
I have a dataframe which has 10k plus rows. I need to drop a row if a name repeats itself within a row
Second row is deleted as "Chris" appears twice in the same row. I am reasonably new to programming and am not sure where to even begin
One idea might be to filter on "name" columns, and get nunique
names across axis 1. If that number is less than the number of columns then there are duplicates... use this logic to boolean index
:
# Example data
df = pd.DataFrame({'Name1': ['chris', 'mark', 'chris', 'john'],
'Age1': [20, 30, 35, 40],
'Name2': ['joe', 'steve', 'chris', 'eric']})
# Name1 Age1 Name2
# 0 chris 20 joe
# 1 mark 30 steve
# 2 chris 35 chris
# 3 john 40 eric
name_cols = df.filter(like='Name').columns
df_new = df[df[name_cols].nunique(axis=1).eq(len(name_cols))]
print(df_new)
[out]
name1 age1 name2
0 chris 20 joe
1 mark 30 steve
3 john 40 eric