1

I have the following data-frame :

      id cluster       username 2001 2002 2003 2004 2005 2006 2007  2008  2009  2010   2011
1 268672  Type 1          Vlaam    0    0    0    0    0    0 5896 18976 13552 20508 106939
2 351003  Type 2 WikiCleanerBot    0    0    0    0    0    0    0 17049  8468 22834   7470
   2012  2013  2014  2015  2016
1 83874 97447 59677 88661 41133
2 11219 83245 28015 40464 25053

I need to create a last variable, telling me what variable in the 2001, 2002... 2016 series contains, for each row, the max of the serie. I write this code :

cluster$yearMod <- apply(cluster,1,function(x) {
  years <- x[4:19]
  as.numeric(names(years)[match(max(years),years)])
})

But this gave me :

[1] 2015 2015

Which is absolutely not the correct value, which was 2011 and 2013.

Can you help me ?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Léo Joubert
  • 522
  • 4
  • 17

1 Answers1

1

If it is the max elements, then we can use max.col. Create a logical index of the numeric column names using grepl ('i1'), then subset the dataset based 'i1' (df1[i1]) get the index of the max value for each row with max.col and use that to get the corresponding column names.

i1 <- grepl("[0-9]+$", names(df1))
df1$newVar <- names(df1)[i1][max.col(df1[i1], "first")]
df1$newVar
#[1] "2011" "2013"

If we are using the apply, then another option is which.max

names(df1)[i1][apply(df1[i1], 1, which.max)]
#[1] "2011" "2013"

data

df1 <- structure(list(id = c(268672L, 351003L), cluster = c("Type 1", 
"Type 2"), username = c("Vlaam", "WikiCleanerBot"), `2001` = c(0L, 
0L), `2002` = c(0L, 0L), `2003` = c(0L, 0L), `2004` = c(0L, 0L
), `2005` = c(0L, 0L), `2006` = c(0L, 0L), `2007` = c(5896L, 
0L), `2008` = c(18976L, 17049L), `2009` = c(13552L, 8468L), 
`2010` = c(20508L, 
22834L), `2011` = c(106939L, 7470L), `2012` = c(83874L, 11219L
), `2013` = c(97447L, 83245L), `2014` = c(59677L, 28015L), 
`2015` = c(88661L, 
40464L), `2016` = c(41133L, 25053L)), .Names = c("id", "cluster", 
 "username", "2001", "2002", "2003", "2004", "2005", "2006", "2007", 
 "2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", 
 "2016"), row.names = c("1", "2"), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662