1

I have to clean a database

Id  FirstName LastName
12  Jannie    Pretorious
15  Elsa      Van Niekerk
14  Koos      SC$ 4 MMV Extraction SHift B

I want to remove the values that aren't LastNames, in this example id 14 : Koos SC$ 4 ...

I can identify a faulty lastname if there is more than two spaces within the cell value.

I want to iterate through the df and replace all values, with more than two spaces, with ''.

I have looked at this question Conditional Replace Pandas, but it didn't help me.

Expected output:

Id  FirstName LastName
12  Jannie    Pretorious
15  Elsa      Van Niekerk
14  Koos      
Llewellyn Hattingh
  • 306
  • 1
  • 5
  • 16

1 Answers1

2

Use Series.mask with Series.str.count:

df['LastName'] = df['LastName'].mask(df['LastName'].str.count(' ') > 2, '')
print (df)
   Id FirstName     LastName
0  12    Jannie   Pretorious
1  15      Elsa  Van Niekerk
2  14      Koos

Detail:

print (df['LastName'].str.count(' '))
0    0
1    1
2    5
Name: LastName, dtype: int64    

Alternative:

df['LastName'] = df['LastName'].mask(df['LastName'].str.split().str.len().sub(1) > 2, '')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252