A completely basic question - and forgive me if it is a duplicate.
set.seed(1)
df <-
data.frame(id=c('a', 'a', 'b', 'b', 'a'),
a=sample(1:10, size=5, replace=T),
b=sample(1:10, size=5, replace=T),
c=sample(1:10, size=5, replace=T))
Then,
> df
id a b c
1 a 3 9 3
2 a 4 10 2
3 b 6 7 7
4 b 10 7 4
5 a 3 1 8
To return the column name (a, b or c) with the largest value, and if this is in the id
variable take the second highest, I use the below function.
FUN <- function(r) {
top <- names(r[,c('a', 'b', 'c')])[order(r[,c('a', 'b', 'c')], decreasing=T)]
ifelse(top[1] == r[['id']], top[2], top[1])
}
I can do:
FUN(df[1,]) #[1] "b"
and for all rows:
res <- NULL
for(i in 1:nrow(df)) {
res <- c(res, FUN(df[i,]))
}
And get
> res
[1] "b" "b" "c" "a" "c"
But how can I apply
this ? E.g. this is not working:
apply(df, 1, FUN)
I suspect the trouble is that FUN
assumes a 1-row data frame (and not a named vector of characters like (first row))
id a b c
"a" "3" "9" "c"
From apply?
:
If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.