2

how to convert the row values to columns with the count of purchases without using the index of customer

DATA :

customer    fruits  veggies grocery
A   apple   carrot  brush
A   apple   carrot  brush
A   apple   onion   soap
A   banana  onion   soap
B   mango   onion   soap
B   mango   carrot  brush
B   banana  tomato  powder
B   banana  tomato  powder
C   apple   carrot  powder
C   mango   carrot  soap
C   mango   tomato  soap
C   banana  tomato  brush
D   banana  carrot  brush
D   banana  onion   soap
D   apple   tomato  powder
D   apple   tomato  powder

Expected Output :

customer    apple   mango   banana  carrot  onion   tomato  brush   soap    powder
A   3   0   1   2   2   0   2   2   0
B   0   2   2   1   1   2   1   1   2
C   1   2   1   2   0   2   1   2   1
D   2   0   2   1   1   2   1   1   2
pylearner
  • 1,358
  • 2
  • 10
  • 26
  • @jezrael , Can you answer this ? – pylearner Feb 20 '18 at 10:44
  • You first need to melt your DataFrame before pivoting: `df.melt(id_vars='customer').groupby(['customer', 'value']).size().unstack(fill_value=0)` – ayhan Feb 20 '18 at 10:50
  • @piRSquared, as it is the question you asked for, can u direct me on which question number will I get my answer ? – pylearner Feb 20 '18 at 10:56
  • @pylearner I was wrong. My apologies. – piRSquared Feb 20 '18 at 10:56
  • @pylearner There are other equally capable users besides jezrael who can answer your question, possibly better. So, please keep an open mind :) – cs95 Feb 20 '18 at 11:02
  • @cᴏʟᴅsᴘᴇᴇᴅ, for most of my questions, I got an answer from jezrael..thats the reason I asked him to answer, apologies if that meant in a wrong way. – pylearner Feb 21 '18 at 05:21
  • No offence taken! I only meant that in the sense that being biased towards a certain user is a little unfair to other contributors who are wiling to help ;) – cs95 Feb 21 '18 at 05:34
  • @cᴏʟᴅsᴘᴇᴇᴅ, Thats true. I dint meant to do this anyway. – pylearner Feb 21 '18 at 06:05

2 Answers2

3

Option 1
Use set_index + stack + get_dummies:

df.set_index('customer').stack().str.get_dummies().sum(level=0)

          apple  banana  brush  carrot  mango  onion  powder  soap  tomato
customer                                                                  
A             3       1      2       2      0      2       0     2       0
B             0       2      1       1      2      1       2     1       2
C             1       1      1       2      2      0       1     2       2
D             2       2      1       1      0      1       2     1       2

Option 2
Another one, slightly cleaner, using pd.crosstab:

v = df.set_index('customer').stack()
pd.crosstab(v.index.get_level_values(0), v.values)

col_0  apple  banana  brush  carrot  mango  onion  powder  soap  tomato
row_0                                                                  
A          3       1      2       2      0      2       0     2       0
B          0       2      1       1      2      1       2     1       2
C          1       1      1       2      2      0       1     2       2
D          2       2      1       1      0      1       2     1       2

crosstab is a specialised version of pivot_table, perfect for these sort of tabulation operations.

cs95
  • 379,657
  • 97
  • 704
  • 746
2

dot

d = pd.get_dummies(df)
d.columns = d.columns.str.split('_', expand=True)

c = d.pop('customer')

c.T.dot(d)

  fruits              veggies              grocery            
   apple banana mango  carrot onion tomato   brush powder soap
A      3      1     0       2     2      0       2      0    2
B      0      2     2       1     1      2       1      2    1
C      1      1     2       2     0      2       1      1    2
D      2      2     0       1     1      2       1      2    1

bincount, factorize

i, r = df.customer.factorize()
v = df.drop('customer', 1).values
j, c = pd.factorize(v.ravel())
n, m = len(r), len(c)

b = np.bincount(i.repeat(v.shape[1]) * m + j, minlength=n * m).reshape(n, m)

pd.DataFrame(b, r, c)

   apple  carrot  brush  onion  soap  banana  mango  tomato  powder
A      3       2      2      2     2       1      0       0       0
B      0       1      1      1     1       2      2       2       2
C      1       2      1      0     2       1      2       2       1
D      2       1      1      1     1       2      0       2       2

piRSquared
  • 285,575
  • 57
  • 475
  • 624