0

I have a data like the following and I want to know the percentage of people who have bought from more than 2 brands:

     hh_code    brand
     3032145       536
     3032145       53
     3032145       534
     324063        536
     204128        53
     84787         536

and I want to the number of brands bought by each household - like the following:

   hh_code    unique_ brand
   3032145    3
   847827     1
   204128     1
    84787     1

I have tried using table but it is just giving me frequencies. Would appreciate any insights!

lll
  • 1,049
  • 2
  • 13
  • 39
  • This is a pretty common question. You could also use `table(df$hh_code)` which would work for your example or if HHs buy the same brand multiple times, `table(unique(df)$hh_code)` should work for a data.frame with 2 columns. – lmo Jul 18 '16 at 17:02
  • ok, thanks! I tried table(df$hh_code, df$brand) before and it did not work – lll Jul 18 '16 at 17:08

2 Answers2

1

We can use data.table

library(data.table)
setDT(df1)[, .(unique_brand = uniqueN(brand)), by = hh_code]
#   hh_code unique_brand
#1: 3032145            3
#2:  324063            1
#3:  204128            1
#4:   84787            1
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Simple, base R solution using tapply:

num_brands <- tapply(df$brand, df$hh_code, length)
ge2_brands <- num_brands > 2
Zelazny7
  • 39,946
  • 18
  • 70
  • 84