2

I have a dataframe df with three columns item, store and week. It looks like this:

 item           store         week
24128          272568         1203
25554          272568         1203
24177          272568         1203
72000          272568         1203
72001          272568         1203
24128          272568         1204
25554          272568         1204
24177          272568         1204
72000          272568         1204
72001          272568         1204
-----          ------         ----
24128          272569         1203
25554          272569         1203
24177          272569         1203
72000          272569         1203
72001          272569         1203
24128          272569         1204
25554          272569         1204
24177          272569         1204
72000          272569         1204
72001          272569         1204

I would like to see each item exists in how many store. I tried the following:

aggregate(store~item, data = df,FUN = "length")

and the function summaryBy in doBy package:

summaryBy(store~item,data = df,FUN = c(length))

But, the function length returns number of store with double counting, i.e, it counts each store for each week. How do I avoid this double counting and get unique length of store for each item?

Rnovice
  • 333
  • 1
  • 5
  • 18

1 Answers1

2

We need to get the length on the unique elements

aggregate(store~item, data = df,FUN = function(x) length(unique(x)))

Or if we are using dplyr

library(dplyr)
df %>%
  group_by(item) %>%
  summarise(storen = n_distinct(store))
akrun
  • 874,273
  • 37
  • 540
  • 662