1

I have a large dataset with repeated measures over 5 time periods.

   2012  2009  2006  2003  2000
    3     1     4     4     1
    5     3     2     2     3
    6     7     3     5     6 

I want to add a new column, which is the number of unique values among years 2000 to 2012. e.g.,

   2012  2009  2006  2003  2000  nunique
    3     1     4     4     1      3
    5     3     2     2     3      3
    6     7     3     5     6      4

I am working in R and, if it helps, there are only 14 possible different values of the measured value at each time period.

I found this page: Count occurrences of value in a set of variables in R (per row) and tried the various solutions offered on it. What it gives me however is a count of each value, not the number of unique values. Other similar questions on here seem to ask about counting number of unique values within a variable /column, rather than across each row. Any suggestions would be appreciated.

Community
  • 1
  • 1
user3251223
  • 265
  • 1
  • 7
  • 15

4 Answers4

2

Here's one alternative

> df$nunique <- apply(df, 1, function(x) length(unique(x)))
> df
  2012 2009 2006 2003 2000 nunique
1    3    1    4    4    1       3
2    5    3    2    2    3       3
3    6    7    3    5    6       4
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • 1
    Please note: if your dataframe has NAs in it, this will count those as unique values. Ammend with: df$nunique <- apply(df, 1, function(x) length(unique(na.omit(x)))) - I've added an 'na.omit' applied to 'x'. – Jordan Collins Mar 17 '16 at 15:41
1

If you have a large dataset, you may want to avoid looping over the rows, but use a faster framework, like S4Vectors:

df <- data.frame('2012'=c(3,5,6),
             '2009'=c(1,3,7),
             '2006'=c(4,2,3),
             '2003'=c(4,2,5),
             '2000'=c(1,3,6))

dup <- S4Vectors:::duplicatedIntegerPairs(as.integer(as.matrix(df)), row(df))
dim(dup) <- dim(df)
rowSums(!dup)

Or, the matrixStats package:

m <- as.matrix(df)
mode(m) <- "integer"
rowSums(matrixStats::rowTabulates(m) > 0)
Michael Lawrence
  • 1,031
  • 5
  • 6
0

The trick is to use 'apply' and assign each row to a variable (e.g. x). You can then write a custom function, in this case one that uses 'unique' and 'length' to get the answer that you want.

df <- data.frame('2012'=c(3,5,6), '2009'=c(1,3,7), '2006'=c(4,2,3), '2003'=c(4,2,5), '2000'=c(1,3,6))

df$nunique = apply(df, 1, function(x) {length(unique(x))})
Sam Danziger
  • 106
  • 4
0

try this one out:

sapply(data, function(x) length(unique(x)))
Mudit Gupta
  • 401
  • 4
  • 8