Apply FUN row-wise on data frame with integer and character variables

Question

A completely basic question - and forgive me if it is a duplicate.

set.seed(1)
df <- 
  data.frame(id=c('a', 'a', 'b', 'b', 'a'),
             a=sample(1:10, size=5, replace=T),
             b=sample(1:10, size=5, replace=T),
             c=sample(1:10, size=5, replace=T))

Then,

> df
  id  a  b c
1  a  3  9 3
2  a  4 10 2
3  b  6  7 7
4  b 10  7 4
5  a  3  1 8

To return the column name (a, b or c) with the largest value, and if this is in the id variable take the second highest, I use the below function.

FUN <- function(r) {
  top <- names(r[,c('a', 'b', 'c')])[order(r[,c('a', 'b', 'c')], decreasing=T)]
  ifelse(top[1] == r[['id']], top[2], top[1])
}

I can do:

FUN(df[1,]) #[1] "b"

and for all rows:

res <- NULL
for(i in 1:nrow(df)) {
res <- c(res, FUN(df[i,]))  
}

And get

> res
[1] "b" "b" "c" "a" "c"

But how can I apply this ? E.g. this is not working:

apply(df, 1, FUN)

I suspect the trouble is that FUN assumes a 1-row data frame (and not a named vector of characters like (first row))

 id   a   b   c 
"a" "3" "9" "c"

From apply?:

If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.

Do you have to use that function? Or are you looking for a way to get the max columns? — Sotos, Jun 16 '17 at 14:09
I am trying to understand how, in general, to do something row-wise using a function that expect a dataframe row. — user3375672, Jun 16 '17 at 14:11

Mike H. · Answer 1 · 2017-06-16T14:23:04.043

Another option is to make some minor modifications to your FUN. I think the issue you were running into was that apply will treat each row as a vector. Since your id column is a character, this means that your a/b/c columns will also be coerced to character. Realizing this we can modify the FUN slightly to convert it back to numeric for ordering:

FUN <- function(r) {
  top <- c('a', 'b', 'c')[order(as.numeric(r[c('a', 'b', 'c')]), decreasing=T)]
  ifelse(top[1] == as.character(r['id']), top[2], top[1])
}

apply(df, 1, FUN)
#[1] "b" "b" "c" "a" "c"

To see how this works in a little more detail you can run the below and see that apply is reading through named character vectors.

apply(df, 1, function(x) {print(x); print(class(x)); return(NULL)})
#  id    a    b    c 
# "a" " 3" " 9"  "3" 
#[1] "character"
#  id    a    b    c 
# "a" " 4" "10"  "2" 
#[1] "character"
#  id    a    b    c 
# "b" " 6" " 7"  "7" 
#[1] "character"
#  id    a    b    c 
# "b" "10" " 7"  "4" 
#[1] "character"
#  id    a    b    c 
# "a" " 3" " 1"  "8" 
#[1] "character"
#NULL

Yeah, I suspected that but wanted to avoid the `as.numeric` and `as.character` . — user3375672, Jun 16 '17 at 14:21
I included the `as.character` because I didn't know if `r["id"]` was a factor or not. — Mike H., Jun 16 '17 at 14:22

score 1 · Accepted Answer · answered Jun 16 '17 at 14:12

1

If you must use your function, you can do,

sapply(split(df, 1:nrow(df)), f1)
#  1   2   3   4   5 
#"b" "b" "c" "a" "c"

NOTE I renamed your FUN to f1 since FUN is used by various functions in R so as to define the argument of function

answered Jun 16 '17 at 14:12

Sotos

51,121
6
32
66

1

Ah exactly!. You use `split` to get a list by row that can then be fed into `lapply`. Great, and very general. – user3375672 Jun 16 '17 at 14:14
I used `sapply` here to get a vector but you can use `lapply` too If you want the results to be in a list – Sotos Jun 16 '17 at 14:15
Sure sure `sapply` is what I want (as in `lapply(x, f1, simplify=F, USE.NAMES=F)` ) – user3375672 Jun 16 '17 at 14:18

Apply FUN row-wise on data frame with integer and character variables

2 Answers2

Linked