Select rows based on some value in a column and remove duplicates

Question

I have a dataset in Pandas where I have columns like:

brand    categories  
nike     sandals
nike     sneakers   
adidas   sneakers
adidas   sneakers
puma     boots
puma     boots
fila     sneakers

I want to keep the rows with brands containing only "sneaker" and have all duplicate rows removed.

[How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Danila Ganchar, Nov 03 '20 at 14:05
What did you do to remove the duplicates? can you share the code that removes the duplicates and an example of what is printed? it is hard to help you here without knowing why your solution is failing. — ItIsEntropy, Nov 03 '20 at 14:08
to remove the duplicates , I did this : df2.categories.drop_duplicates(keep='first') and df2.brand.drop_duplicates(keep='first') — lea, Nov 03 '20 at 14:26

Anmol Dudani · Accepted Answer · 2020-11-03T17:40:48.343

0

df1 = df[df.categories.str.contains('sneakers', case=False)]

From your orignal pandas dataset use this to take out the needed items.Then delete or just forget the orignal dataframe i.e.

del df

because your new dataframe is df1

edited Nov 03 '20 at 17:40

answered Nov 03 '20 at 14:15

Anmol Dudani

hello, when i do that , it says empty data frame. the rows are like this: Sneakers,Women,Shoes Under $100 \ Shoes,Sneakers,Women's Shoes \ ecc – lea Nov 03 '20 at 14:25
@lea check now. – Vishnudev Krishnadas Nov 03 '20 at 14:34
@Vishnudev after I do that, do I need to create a new data frame with only the needed items? I don't understand what you mean by "delete or just forget". – lea Nov 03 '20 at 14:46
@lea did this explanation help – Anmol Dudani Nov 03 '20 at 17:42

score 0 · Answer 2 · answered Nov 03 '20 at 14:51

0

Select the rows which have category as Sneakers and drop duplicate rows,

condition = df.categories.str.contains('sneakers', case=False)
df = df[condition].drop_duplicates(keep='first')

Output

    brand categories
1    nike   sneakers
2  adidas   sneakers
6    fila   sneakers

answered Nov 03 '20 at 14:51

hello, the condition code worked, but when i try to run df code , the output says : Boolean array expected for the condition, not object – lea Nov 03 '20 at 15:47
What does `print(condition)` give? Check your code, might be a typo? – Vishnudev Krishnadas Nov 03 '20 at 15:52

2 Answers2