R: get the unique values of categorical data sorted by another categorical variable

Question

I have a data like the following and I want to know the percentage of people who have bought from more than 2 brands:

     hh_code    brand
     3032145       536
     3032145       53
     3032145       534
     324063        536
     204128        53
     84787         536

and I want to the number of brands bought by each household - like the following:

   hh_code    unique_ brand
   3032145    3
   847827     1
   204128     1
    84787     1

I have tried using table but it is just giving me frequencies. Would appreciate any insights!

This is a pretty common question. You could also use `table(df$hh_code)` which would work for your example or if HHs buy the same brand multiple times, `table(unique(df)$hh_code)` should work for a data.frame with 2 columns. — lmo, Jul 18 '16 at 17:02
ok, thanks! I tried table(df$hh_code, df$brand) before and it did not work — lll, Jul 18 '16 at 17:08

score 1 · Accepted Answer · answered Jul 18 '16 at 16:48

1

We can use data.table

library(data.table)
setDT(df1)[, .(unique_brand = uniqueN(brand)), by = hh_code]
#   hh_code unique_brand
#1: 3032145            3
#2:  324063            1
#3:  204128            1
#4:   84787            1

answered Jul 18 '16 at 16:48

akrun

874,273
37
540
662

1

That was an impressively quick response. – dayne Jul 18 '16 at 16:50

score 0 · Answer 2 · answered Jul 18 '16 at 17:36

0

Simple, base R solution using tapply:

num_brands <- tapply(df$brand, df$hh_code, length)
ge2_brands <- num_brands > 2

answered Jul 18 '16 at 17:36

Zelazny7

39,946
18
70
84

R: get the unique values of categorical data sorted by another categorical variable

2 Answers2