0

I have a dataframe with this data.

import pandas as pd

data = {'Item':['2', '1', '2'],
    'IsAvailable':['True', 'False', 'False']}
df = pd.DataFrame(data)
================================

Item  |  IsAvailable
---------------------
  2   |     True
  1   |     False
  2   |     False

In the dataframe, I have data like above shown. As you can see I have both True as well as False for Item 2. In that case I want to have a single record with just True.

Expected output:

Item  |  IsAvailable
---------------------
  2   |     True
  1   |     False

Please help in writing the condition for this using python pandas.

Thanks

Suhas_mudam
  • 185
  • 3
  • 13

4 Answers4

1

Since bool is also kind of int:

df = df.sort_values('IsAvailable').drop_duplicates(subset=['Item'], keep='last')

This will reorder your items though, of course. Funny thing: it works even when you have True/False strings.

Oleg O
  • 1,005
  • 6
  • 11
0

I think you need first replace strings True and False to boolean if necessary and then get first row with True per groups by DataFrameGroupBy.idxmax for indices and selecting by DataFrame.loc:

df['IsAvailable'] = df['IsAvailable'].map({'True':True, 'False':False})

df = df.loc[df.groupby('Item', sort=False)['IsAvailable'].idxmax()]
print (df)
  Item  IsAvailable
0    2         True
1    1        False
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

If you just want the first occurence: Edit: as per @jezrael, you may want to map your strings to booleans first

df['IsAvailable'] = df['IsAvailable'].replace({'True':True, 'False':False})
dfOut = df.drop_duplicates(subset="Item", keep='first')
print(dfOut)

  Item IsAvailable
0    2        True
1    1       False
braml1
  • 584
  • 3
  • 13
0

Here is a solution where we check if the value True is one of the values assigned to each item. If so, the outcome is also True.

>>> df.groupby(['Item'])['IsAvailable'].apply(lambda x: 'True' in set(x))
Item
1    False
2     True
Name: IsAvailable, dtype: bool

If you want to keep the column name, use

>>> df.groupby(['Item'])['IsAvailable'].apply(lambda x: 'True' in set(x)).reset_index()
  Item  IsAvailable
0    1        False
1    2         True
Nanna
  • 515
  • 1
  • 9
  • 25
  • Your previous code works i.e, with any(x)... true in x is giving me the wrong output. – Suhas_mudam Mar 05 '20 at 10:22
  • Yes, I noticed that as well. any(x) gave true as long as there was some value in the list that x is. Perhaps it is interpreting 'False' as string, and bool('False') is True in python. But the current version works, (lambda x: True in x). – Nanna Mar 05 '20 at 10:28
  • The output is not as per my request. Please check once the expected output. – Suhas_mudam Mar 05 '20 at 10:49
  • I'm sorry you're right. My mistake is explained here: https://stackoverflow.com/a/21320011/8446061. See updated answer. – Nanna Mar 05 '20 at 11:10