0

I have a dataset in Pandas where I have columns like:

brand    categories  
nike     sandals
nike     sneakers   
adidas   sneakers
adidas   sneakers
puma     boots
puma     boots
fila     sneakers

I want to keep the rows with brands containing only "sneaker" and have all duplicate rows removed.

Bill Huang
  • 4,491
  • 2
  • 13
  • 31
lea
  • 3
  • 2
  • 4
    [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Danila Ganchar Nov 03 '20 at 14:05
  • 1
    What did you do to remove the duplicates? can you share the code that removes the duplicates and an example of what is printed? it is hard to help you here without knowing why your solution is failing. – ItIsEntropy Nov 03 '20 at 14:08
  • to remove the duplicates , I did this : df2.categories.drop_duplicates(keep='first') and df2.brand.drop_duplicates(keep='first') – lea Nov 03 '20 at 14:26

2 Answers2

0
df1 = df[df.categories.str.contains('sneakers', case=False)]

From your orignal pandas dataset use this to take out the needed items.Then delete or just forget the orignal dataframe i.e.

del df

because your new dataframe is df1

0

Select the rows which have category as Sneakers and drop duplicate rows,

condition = df.categories.str.contains('sneakers', case=False)
df = df[condition].drop_duplicates(keep='first')

Output

    brand categories
1    nike   sneakers
2  adidas   sneakers
6    fila   sneakers
Vishnudev Krishnadas
  • 10,679
  • 2
  • 23
  • 55