0

I got data frame in R that looks like this:

> df
   c1 c2 c3
1: 10 c1 i1
2: 10 c1 i2
3: 10 c1 i3
4: 10 c2 i1
5: 10 c2 i2
6: 10 c2 i3
7: 20 c11 i1
8: 20 c11 i2
9: 20 c11 i3
10: 20 c12 i1
11: 20 c12 i2
12: 20 c12 i3

I need to sum distinct counts of columns c2 and c3 group by c1 - to get the following result:

10 2 3
20 2 3

How would I get that done in R?

Thanks

thelatemail
  • 91,185
  • 12
  • 128
  • 188
Zoran Krunic
  • 197
  • 1
  • 2
  • 12
  • .. the row count is not very large (thousands perhaps) so performance is not the issue ... – Zoran Krunic Sep 14 '16 at 01:06
  • I am getting an error running this: – Zoran Krunic Sep 14 '16 at 01:29
  • > df # A tibble: 12 x 3 c1 c2 c3 1 10 c1 i1 2 10 c1 i2 3 10 c1 i3 4 10 c2 i1 5 10 c2 i2 6 10 c2 i3 7 20 c11 i1 8 20 c11 i2 9 20 c11 i3 10 20 c12 i1 11 20 c12 i2 12 20 c12 i3 > df[, lapply(.SD, uniqueN), by=c1, .SDcols=c("c2","c3")] Error in `[.tbl_df`(df, , lapply(.SD, uniqueN), by = c1, .SDcols = c("c2", : unused arguments (by = c1, .SDcols = c("c2", "c3")) – Zoran Krunic Sep 14 '16 at 01:29
  • .. df is a data frame in this case – Zoran Krunic Sep 14 '16 at 01:35
  • Related - http://stackoverflow.com/questions/28173677/r-how-to-count-the-number-of-unique-values-by-group/28173835 – thelatemail Sep 14 '16 at 02:00

2 Answers2

3

Using base R aggregate

aggregate(cbind(c2,c3)~c1, df, function(x) length(unique(x)))


#  c1 c2 c3
#1 10  2  3
#2 20  2  3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

As the OP's dataset looks like a data.table, we can use data.table methods. Convert the 'data.frame' to 'data.table' (setDT(df1) - if it is a 'data.frame'), grouped by 'c1', we loop through the Subset of Data.table (.SD) and find the length of unique elements (uniqueN).

library(data.table)
setDT(df1)[, lapply(.SD, uniqueN), by= c1]
#   c1 c2 c3
#1: 10  2  3
#2: 20  2  3
akrun
  • 874,273
  • 37
  • 540
  • 662
  • thanks for both answers; in either case - if I want some column to be the count of different values (like above) and for some other to be sum of the values( for numeric columns) - basically two different functions applied to two different subset of columns - with the same group by one column as above, would that be possible in one line? – Zoran Krunic Sep 14 '16 at 18:06
  • @ZoranKrunic Yes, it is possible, `setDT(df1)[, c(lapply(.SD[, names1vec, with = FALSE], uniqueN), lapply(.SD[, names2vec, with = FALSE], sum)), by = c1]` – akrun Sep 14 '16 at 18:07