Apply, dataframes, and booleans don't work together?

Question

In the following, logical operators don't seem to work properly.

a = c(TRUE, FALSE, TRUE, FALSE, TRUE, TRUE)
b = c('a', 'b', 'c', 'de', 'f', 'g')
c = c(1, 2, 3, 4, 5, 6)
d = c(0, 0, 0, 0, 0, 1)

wtf = data.frame(a, b, c, d)
wtf$huh = apply(wtf, 1, function(row) {
    if (row['a'] == T) { return('we win') }
    if (row['c'] < 5) { return('hooray') }
    if (row['d'] == 1) { return('a thing') }
    return('huh?')
})

Producing:

> wtf
      a  b c d     huh
1  TRUE  a 1 0  hooray
2 FALSE  b 2 0  hooray
3  TRUE  c 3 0  hooray
4 FALSE de 4 0  hooray
5  TRUE  f 5 0    huh?
6  TRUE  g 6 1 a thing

Where naively one would expect that in rows 1, 3, 5, and 6, there would be we win.

Can someone explain to me (1) why it does this, (2) how can this be fixed such that it doesn't happen, (3) why all my logical columns are seemingly changed to characters, and (4) how can a function be type-safely applied to rows in a data frame?

When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Using `apply` with data.frames is not a good idea because it coerces to a matrix first which can change all your data types. — MrFlick, Apr 18 '18 at 20:21
Agree with @MrFlick the problem is almost certainly the use of `apply`. — joran, Apr 18 '18 at 20:23
There are some slick tools for operating on data frames by row in **purrr**, but frankly a simple for loop would be a fine place to start. — joran, Apr 18 '18 at 20:30
To be clear, `apply` is working correctly, it's just that it's correct behavior is confusing. `apply` literally coerces your data frame to a matrix. A matrix can only contain a single data type. Hence, typically everything will become characters (or whatever is the most common thing it can coerce to). — joran, Apr 18 '18 at 20:32

Gregor Thomas · Accepted Answer · 2018-04-18T20:41:39.917

7

Why does this happen? Because is apply is made for matrices. When you give it a data frame, then the first thing that happens is it gets converted to a matrix:

m = as.matrix(wtf)
m 
#      a       b    huh    huh1    
# [1,] " TRUE" "a"  "huh?" "hooray"
# [2,] "FALSE" "b"  "huh?" "huh?"  
# [3,] " TRUE" "c"  "huh?" "hooray"
# [4,] "FALSE" "de" "huh?" "huh?"  
# [5,] " TRUE" "f"  "huh?" "hooray"
# [6,] " TRUE" "g"  "huh?" "hooray"

When that happens, your different data types are lost and your data frame-style indexing doesn't work anymore:

m['a']
# [1] NA

Solution? Use a simple for loop:

wtf$huh1 = NA
for (i in 1:nrow(wtf)) {
        wtf$huh1[i] = if(wtf[i, 'a']) "hooray" else "huh?"
}

If you have a function foo then

wtf$huh2 = NA
for (i in 1:nrow(wtf)) {
        wtf$huh1[i] = foo(wtf[i, 'a'])
}

Functions that aren't vectorized can be vectorized to avoid the need for loops:

foov = Vectorize(foo)
# then you can
wtf$huh4 = foov(wtf$a)

edited Apr 18 '18 at 20:41

answered Apr 18 '18 at 20:33

Gregor Thomas

136,190
20
167
294

Is there a way to make this work with a provided function? – ifly6 Apr 18 '18 at 20:37
Then what is it? – ifly6 Apr 18 '18 at 20:39
Details can depend on the function, but something like what I just added in edits. – Gregor Thomas Apr 18 '18 at 20:45
What if the function takes the row as its argument? – ifly6 Apr 18 '18 at 21:00
@ifly6 Then give it the row as an argument. Do you not know how to index data frames? `foo(wtf[i, ])` (in my for loop example). But it's a poor function that expects a single row of a data frame - data frames aren't meant to be worked on one row at a time. – Gregor Thomas Apr 18 '18 at 21:44

score 1 · Answer 2 · answered Apr 18 '18 at 20:36

Probably the easiest way to fix this is using ifelse which is vectorized, so you don't need to deal with loops, or apply:

myfunc <- function(row) {
     ifelse (row['a'] == T,'hooray','huh?')
 }

wtf$huh <- myfunc(wtf)

      a  b      a
1  TRUE  a hooray
2 FALSE  b   huh?
3  TRUE  c hooray
4 FALSE de   huh?
5  TRUE  f hooray
6  TRUE  g hooray

mohanty · Answer 3 · 2018-05-03T21:29:07.717

One advantage of a data.frame is that they can contain variables of different types of variables.

    lapply(wtf, typeof)
    $a
    [1] "logical"

    $b
    [1] "factor"

    $huh
    [1] "character"

As noted by Gregor, apply requires a matrix and will convert the object you give it to one if possible. But matrices cannot contain multiple variable types and so as.matrix will look for a lowest common denominator that can represent the data, in this case, character.

    typeof(as.matrix(wtf))    
    [1] "character"

    class(as.matrix(wtf))    
    [1] "matrix"

Apply, dataframes, and booleans don't work together?

3 Answers3