R language - count of multiple columns group by one column

Question

I got data frame in R that looks like this:

> df
   c1 c2 c3
1: 10 c1 i1
2: 10 c1 i2
3: 10 c1 i3
4: 10 c2 i1
5: 10 c2 i2
6: 10 c2 i3
7: 20 c11 i1
8: 20 c11 i2
9: 20 c11 i3
10: 20 c12 i1
11: 20 c12 i2
12: 20 c12 i3

I need to sum distinct counts of columns c2 and c3 group by c1 - to get the following result:

10 2 3
20 2 3

How would I get that done in R?

Thanks

.. the row count is not very large (thousands perhaps) so performance is not the issue ... — Zoran Krunic, Sep 14 '16 at 01:06
> df # A tibble: 12 x 3 c1 c2 c3 1 10 c1 i1 2 10 c1 i2 3 10 c1 i3 4 10 c2 i1 5 10 c2 i2 6 10 c2 i3 7 20 c11 i1 8 20 c11 i2 9 20 c11 i3 10 20 c12 i1 11 20 c12 i2 12 20 c12 i3 > df[, lapply(.SD, uniqueN), by=c1, .SDcols=c("c2","c3")] Error in `[.tbl_df`(df, , lapply(.SD, uniqueN), by = c1, .SDcols = c("c2", : unused arguments (by = c1, .SDcols = c("c2", "c3")) — Zoran Krunic, Sep 14 '16 at 01:29
Related - http://stackoverflow.com/questions/28173677/r-how-to-count-the-number-of-unique-values-by-group/28173835 — thelatemail, Sep 14 '16 at 02:00

score 3 · Accepted Answer · answered Sep 14 '16 at 01:35

3

Using base R aggregate

aggregate(cbind(c2,c3)~c1, df, function(x) length(unique(x)))


#  c1 c2 c3
#1 10  2  3
#2 20  2  3

answered Sep 14 '16 at 01:35

Ronak Shah

377,200
20
156
213

score 1 · Answer 2 · answered Sep 14 '16 at 02:35

1

As the OP's dataset looks like a data.table, we can use data.table methods. Convert the 'data.frame' to 'data.table' (setDT(df1) - if it is a 'data.frame'), grouped by 'c1', we loop through the Subset of Data.table (.SD) and find the length of unique elements (uniqueN).

library(data.table)
setDT(df1)[, lapply(.SD, uniqueN), by= c1]
#   c1 c2 c3
#1: 10  2  3
#2: 20  2  3

answered Sep 14 '16 at 02:35

akrun

874,273
37
540
662

thanks for both answers; in either case - if I want some column to be the count of different values (like above) and for some other to be sum of the values( for numeric columns) - basically two different functions applied to two different subset of columns - with the same group by one column as above, would that be possible in one line? – Zoran Krunic Sep 14 '16 at 18:06
@ZoranKrunic Yes, it is possible, `setDT(df1)[, c(lapply(.SD[, names1vec, with = FALSE], uniqueN), lapply(.SD[, names2vec, with = FALSE], sum)), by = c1]` – akrun Sep 14 '16 at 18:07

R language - count of multiple columns group by one column

2 Answers2