2

i am trying to find the item belongs to which category based on mode by using below pandas data frame

data

   ITEM          CATEGORY
1  red saree     actual
2  red saree     actual
3  glass         lbh
4  glass         lbh
5  red saree     actual
6  red saree     lbh
7  glass         actual
8  bottle        actual
9  bottle        actual

what i am trying to figure out is by taking the unique items i want to assign a category to each unique item based on mode of category.

Expected output:

  ITEM          CATEGORY
1 red saree     actual
2 glass         lbh
3 bottle        actual

please tell me how to find out in python,Thanks in advance

tried:

import pandas as pd
from scipy.stats import mode
data_x = map(mode, data['category'])

I know it is not even related to that, but unable to find how to write the code for that

EdChum
  • 376,765
  • 198
  • 813
  • 562
surendra
  • 31
  • 1
  • 3
  • 5

2 Answers2

6

You can groupy the 'ITEM' and 'CATEGORY' columns and then call apply on the df groupby object and pass the function mode. We can then call reset_index and pass param drop=True so that the multi-index is not added back as a column as you already have those columns:

In [161]:

df.groupby(['ITEM', 'CATEGORY']).apply(pd.DataFrame.mode).reset_index(drop=True)
Out[161]:
        ITEM CATEGORY
0     bottle   actual
1      glass      lbh
2  red saree   actual
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • thanks for the answer! but i am supposed to say for example :if we have values instead of types in categories varible. i need to find out the max value from that. for eg: we have an item(glass) with 3 values(1.5,2.5,3.5), then the max of the categories for item(glass) is 3.5. And then i need to say that 3.5 belongs to 'actual' or 'LBH' . how can i find that using python – surendra May 14 '15 at 05:27
  • @surendra that's a new question please post as a brand new question, thanks – EdChum May 14 '15 at 08:41
1

Since you're already using Pandas, you could always try pandasql and work with that if you're more comfortable with SQL.

from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

q = """
select 
    item, 
    category, 
    count(*) as cnt
from df
group by item, category
"""

df_counts = pysqldf(q)

q1 = """
select b.item, b.category 
from 
    (select 
        item, 
        max(cnt) as max_cnt
    from df_counts
    group by item) as a
    inner join df_counts as b
        on a.item = b.item 
            and a.max_cnt = b.cnt
"""

with the output:

>>> pysqldf(q1)
        item category
0  bottle      actual
1  glass          lbh
2  red saree   actual
chappers
  • 2,415
  • 14
  • 16