0

I have a pandas df like this (a small sample):

time name val1 val2
0500 unit1 1 nan
0500 unit1 nan 1
0500 unit1 1 1
0500 unit2 1 nan
0500 unit3 1 nan
0500 unit3 nan 1
0500 unit3 1 1

What I want is this:

time name val1 val2
0500 unit1 1 1
0500 unit2 1 nan
0500 unit3 1 1

I have a list of duplicate values, duplicates = ['unit1', 'unit3']

What I attempted is this:

for unit in duplicates:
        temp_df = df.loc[df['name'] == unit].dropna()
        update_df = update_df.append(temp_df)

but as I iterate, it's appending the dropped nan values back into the data-frame for other duplicate units. How else can I do this with a data-frame? Thank you.

pariskey
  • 53
  • 6
  • Does this answer your question? [Drop all duplicate rows across multiple columns in Python Pandas](https://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-across-multiple-columns-in-python-pandas) – sushanth Aug 09 '21 at 14:55
  • @sushanth I don't believe so. The columns I need to drop have nans, not duplicates. – pariskey Aug 09 '21 at 15:01

1 Answers1

0

try:

Firstly sort values by using sort_values() and after that store your condition in a variable so here the condition gives a boolean Series so after that pass that boolean Series to the dataframe and filter out rows

df=df.sort_values(['name','val1','val2'])
m=~((df.duplicated(subset=['name'])) & (df['name'].isin(duplicates)))
#Finally:
out=df[m]
#OR
out=df.loc[m]

output of out:

   time     name    val1    val2
2   500     unit1   1.0     1.0
3   500     unit2   1.0     NaN
6   500     unit3   1.0     1.0
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • Thank you! however, when I run this, I only get rows at index 2 and 6, and I lose the row at index 3 – pariskey Aug 09 '21 at 15:22
  • @pariskey since you run your for loop so maybe that deleted the index 3 so kindly check your dataframe again......it's working completely fine on my side and giving me the rows of index 2,3 and 6...so can you pls recheck? – Anurag Dabas Aug 09 '21 at 17:35