1

I have a data.table as follows

library(data.table)
library(haven)
df1 <- fread(
    "A   B   C  iso   year   
     0   B   1  NLD   2009   
     1   A   2  NLD   2009   
     0   Y   3  AUS   2011   
     1   Q   4  AUS   2011   
     0   NA  7  NLD   2008   
     1   0   1  NLD   2008   
     0   1   3  AUS   2012",
  header = TRUE
)

I want to count the unique values of the combination of iso, and year (which would be NLD 2009, AUS 2011, NLD 2008 and AUS 2012, so 4.

I tried df1[,uniqueN(.(iso, year))] and df1[,uniqueN(c("iso", "year"))]

The first one gives an error, and the second one gives the answer 2, where I am looking for 4 unique combinations.

What am I doing wrong here?

(as I am doing this with a big dataset of strings, I would prefer no to combine the columns, then test).

Tom
  • 2,173
  • 1
  • 17
  • 44
  • 1
    Your first attempt (`df1[,uniqueN(.(iso, year))]`) is wrong because `uniqueN` 's first argument should be an atomic vector or data.frame/data.table but the actual argument you passed to it is a list (`.(iso, year)`); and in the second case, you passed an atomic vector with two distinct elements (`c("iso", "year")`) instead of the data on which you want to compute the unique values – B. Christian Kamgang Jul 11 '21 at 11:54
  • Please study the help text: `uniqueN(x, by= `; `x`: A `data.table`; `by`: `character` or `integer` vector indicating which combinations of columns from `x` to use for uniqueness checks. – Henrik Jul 11 '21 at 13:05

2 Answers2

2

You can solve it as follows using data.table package.

df1[, uniqueN(.SD), .SDcols=c("iso", "year")]

or

uniqueN(df1, by=c("iso", "year"))
2

Alternative to the data.table approach, count from dplyr does it very nicely:

library(dplyr)
df1 %>% count(iso, year)

Output:

   iso year n
1: AUS 2011 2
2: AUS 2012 1
3: NLD 2008 2
4: NLD 2009 2
bird
  • 2,938
  • 1
  • 6
  • 27