2

i have a dataframe like this

df = pd.DataFrame({
    'User':['101','101','102','102','102'],
    'Product':['x','x','x','z','z'],
    'Country':['India,Brazil','India','India,Brazil,Japan','India,Brazil','Brazil']
})

and i want to get country and product combination count by user wise like below

first split the countries then combine with product and take the count.

wanted output:

enter image description here

Anton vBR
  • 18,287
  • 5
  • 40
  • 46
Kumar AK
  • 987
  • 3
  • 10
  • 23

2 Answers2

3

Here is one way combining other answers on SO (which just shows the power of searching :D)

import pandas as pd

df = pd.DataFrame({
    'User':['101','101','102','102','102'],
    'Product':['x','x','x','z','z'],
    'Country':['India,Brazil','India','India,Brazil,Japan','India,Brazil','Brazil']
})

# Making use of: https://stackoverflow.com/a/37592047/7386332
j = (df.Country.str.split(',', expand=True).stack()
                                           .reset_index(drop=True, level=1)
                                           .rename('Country'))
df = df.drop('Country', axis=1).join(j)

# Reformat to get desired Country_Product
df = (df.drop(['Country','Product'], 1)
      .assign(Country_Product=['_'.join(i) for i in zip(df['Country'], df['Product'])]))

df2 = df.groupby(['User','Country_Product'])['User'].count().rename('Count').reset_index()

print(df2)

Returns:

  User Country_Product  count
0  101        Brazil_x      1
1  101         India_x      2
2  102        Brazil_x      1
3  102        Brazil_z      2
4  102         India_x      1
5  102         India_z      1
6  102         Japan_x      1
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
3

How about get_dummies

df.set_index(['User','Product']).Country.str.get_dummies(sep=',').replace(0,np.nan).stack().sum(level=[0,1,2])
Out[658]: 
User  Product        
101   x        Brazil    1.0
               India     2.0
102   x        Brazil    1.0
               India     1.0
               Japan     1.0
      z        Brazil    2.0
               India     1.0
dtype: float64
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Nice indeed +1.Should be significantly faster. Need some extra operations though to get desired output. – Anton vBR Jul 07 '18 at 19:30