0

I have a dataframe which has 10k plus rows. I need to drop a row if a name repeats itself within a row

Example enter image description here

Expected enter image description here

Second row is deleted as "Chris" appears twice in the same row. I am reasonably new to programming and am not sure where to even begin

Chris Adams
  • 18,389
  • 4
  • 22
  • 39
Vrle
  • 9
  • 1
  • 1
    Kindly post actual data, and not pics.https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – sammywemmy Mar 07 '20 at 01:37
  • What is the issue, exactly? Please be more specific. _I am reasonably new to programming and am not sure where to even begin_ Stack Overflow is not a substitute for guides, tutorials, or documentation, which are likely what you need. – AMC Mar 07 '20 at 01:41

1 Answers1

2

One idea might be to filter on "name" columns, and get nunique names across axis 1. If that number is less than the number of columns then there are duplicates... use this logic to boolean index:

# Example data
df = pd.DataFrame({'Name1': ['chris', 'mark', 'chris', 'john'],
                   'Age1': [20, 30, 35, 40],
                   'Name2': ['joe', 'steve', 'chris', 'eric']})


#    Name1  Age1  Name2
# 0  chris    20    joe
# 1   mark    30  steve
# 2  chris    35  chris
# 3   john    40   eric

name_cols = df.filter(like='Name').columns
df_new = df[df[name_cols].nunique(axis=1).eq(len(name_cols))]
print(df_new)

[out]

   name1  age1  name2
0  chris    20    joe
1   mark    30  steve
3   john    40   eric
Chris Adams
  • 18,389
  • 4
  • 22
  • 39