0

I want to clear the contents of the first two cells in location for every first 2 duplicates in last name. For eg: i want to clear out the 1st 2 location occurances for Balchuinas and only keep the 3rd one. Same goes for London and Fleck. I ONLY want to clear out the location cells, not complete rows.

Any help?

enter image description here

I tried the .drop_duplicates,keep='last' method but that removes the whole row. I only want to clear the contents of the cells (or change it to NaN if thats possible)

Ps. This is my first time asking a question so im not sure how to paste the image without a link. Please help!

Pruthvi
  • 85
  • 10
Vish
  • 13
  • 3
  • 2
    *This is my first time asking a question* - see [ask]. Also see [how to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – BigBen Feb 15 '23 at 19:33

2 Answers2

0

Rather than removing the duplicate rows. I would suggest, find the duplicate values and replace it with NaN while keeping the last cell value

Something like this:

df[df.duplicated(keep='last')] = float('nan')
Pruthvi
  • 85
  • 10
0
  1. Reverse the order of the dataframe.
  2. Create an empty list for the last column and store values in it as you iterate through the df.
  3. If the value is already present in the list, change the 'Location(Level 2)' column to nan. If the value is not present, add the value to the list and keep the 'Location(Level 2)' unchanged.
  4. Restore the original order of the dataframe.
#Reversing the dataframe
df= df.iloc[::-1]

#Creating a list to store the last column values
locations_list = []


#Iterating through rows
for index, rows in df.iterrows():
    Lastname= rows["Last Name"]
    
    #If its the first occurrence of the Last Name (since the df is reversed), do nothing and add it to the list. 
    if Lastname not in locations_list:
        locations_list.append(Lastname)
    
    #If Last name is already present in the list, make the Locations(Level 2) nan.
    else:
        df.loc[index,"Location(Level 2)"] = np.nan
    
#Get the df in the original order
df= df.iloc[::-1]
df
Dishant
  • 1
  • 2