dplyr nested ifelse errors - is it vector recycling?

Question

I can write this code that adds two columns to the iris data set. The first added column is a sum of the first four columns. The second added column is my attempt at "programming".

iris.size <- iris %>% 
  mutate(Total = 
           apply(.[(1:4)], 1, sum)
         ) %>% 
  mutate(Size = 
           ifelse(
             apply(.[(1:4)], 1, sum) != 0 & 
               .[2] > .[3], "Output1", 
             ifelse(
               apply(.[(1:4)], 1, sum) == 0 & 
                 .[2] > .[3], "Output2", 
               "Output3")
             )
         )

You'll notice this code does not throw any errors and it does output what I want it to output. But watch what happens when I try my next step in analysis.

iris.size %>% arrange(Size)

Error: Column Size must be a 1d atomic vector or a list

It must be my ifelse logic. Correct? Ifelse logic seems straightforward. If condition 1 than output1, otherwise if condition 2 than output2, otherwise output3.

I ended up forcing iris.size$Size into a vector using as.vector but I'd like to know where my logic went wrong in the first place so I don't have to resort to using band aids in the future. After some googling it sounds like if statements are preferred over ifelse statements in R, but if statements only seem to work on single logical values, not vectors.

Duplicate of a post just asked over an hour ago?! https://stackoverflow.com/q/48016306/5874001 — InfiniteFlash, Dec 29 '17 at 03:14
You'll have to take my word for it. I've got nothing to do with the other question being asked. `.[2] > .[3]` does mean 2nd column greater than 3rd. It'd make more sense if you saw my actual data frame (sensitive info, can't post it here). But the same error happens when applied to iris data set. — stackinator, Dec 29 '17 at 03:26
Total will never be zero if `Sepal.Width > Petal.Length` and the data set contains no `NA` values. — Len Greski, Dec 29 '17 at 03:57
Is there any clear concise guide on when/where/why to use if, ifelse, or else if. I'm not a programmer, I just try to be. Google isn't hitting so great on this question. — stackinator, Dec 29 '17 at 04:01
see https://stackoverflow.com/questions/17252905/else-if-vs-ifelse. In my answer, technically else if wasn't needed...I just combined it with rowwise for readability. This might be slower for large data frames(?) but it is easier (for me) to read. @InfiniteFlassChess' solution is very nice and concise. — jrlewi, Dec 29 '17 at 04:17

jrlewi · Answer 1 · 2017-12-29T03:46:54.987

Making use of rowwise and splitting things up a bit for readability...

iris.size <- iris %>% 
  mutate(Total = 
           apply(.[(1:4)], 1, sum)
  )
iris.size <-iris.size %>% rowwise %>%  mutate(Size = 
           if(
            Total != 0 && Sepal.Width > Petal.Length)  {
             "Output1"
             } else {
             if(Total == 0 && Petal.Length > Petal.Length){
               "Output2"
             } else { 
               "Output3"}}
)
class(iris.size$Size)
[1] "character"


> iris.size %>% arrange(Size)
# A tibble: 150 x 7
   Sepal.Length Sepal.Width Petal.Length Petal.Width
          <dbl>       <dbl>        <dbl>       <dbl>
 1          5.1         3.5          1.4         0.2
 2          4.9         3.0          1.4         0.2
 3          4.7         3.2          1.3         0.2
 4          4.6         3.1          1.5         0.2
 5          5.0         3.6          1.4         0.2
 6          5.4         3.9          1.7         0.4
 7          4.6         3.4          1.4         0.3
 8          5.0         3.4          1.5         0.2
 9          4.4         2.9          1.4         0.2
10          4.9         3.1          1.5         0.1
# ... with 140 more rows, and 3 more variables:
#   Species <fctr>, Total <dbl>, Size <chr>
>

score 1 · Accepted Answer · answered Dec 29 '17 at 03:40

When you run your code, you get this output as iris.size:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Total Sepal.Width
1          5.1         3.5          1.4         0.2  setosa  10.2     Output1
2          4.9         3.0          1.4         0.2  setosa   9.5     Output1
3          4.7         3.2          1.3         0.2  setosa   9.4     Output1
4          4.6         3.1          1.5         0.2  setosa   9.4     Output1
5          5.0         3.6          1.4         0.2  setosa  10.2     Output1
6          5.4         3.9          1.7         0.4  setosa  11.4     Output1

The reason why it's not displaying Size is because the column Size has not been created. The reason that is occurring is because you're comparing two objects of class data.frame() with .[2] > .[3], not two vectors which would happen with .[, 2] > .[, 3].

I'm still trying to understand what is being created. What is that Sepal.Width column?

Adjust yours with the following:

iris.size <- iris %>%    mutate(Total = 
           apply(.[(1:4)], 1, sum)   ) %>%    mutate(Size = 
           ifelse(
             apply(.[(1:4)], 1, sum) != 0 & 
               .[,2] > .[,3], "Output1", 
             ifelse(
               apply(.[(1:4)], 1, sum) == 0 & 
                 .[,2] > .[,3], "Output2", 
               "Output3")
           )   )

iris.size
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Total    Size
1          5.1         3.5          1.4         0.2  setosa  10.2 Output1
2          4.9         3.0          1.4         0.2  setosa   9.5 Output1
3          4.7         3.2          1.3         0.2  setosa   9.4 Output1
4          4.6         3.1          1.5         0.2  setosa   9.4 Output1
5          5.0         3.6          1.4         0.2  setosa  10.2 Output1
6          5.4         3.9          1.7         0.4  setosa  11.4 Output1

Suggestion:

Here's a condensed version of your code, if you're interested. You can replace Sepal.Width and Sepal.Length with .[,2] and .[,3] if need be.

iris.size <- iris %>% 
             mutate(Total = rowSums(.[,sapply(., is.numeric)]),
                    Size = ifelse(Total != 0 & Sepal.Width > Sepal.Length, "Output1", 
                           ifelse(Total == 0 & Sepal.Width > Sepal.Length, "Output2", "Output3")))%>%
             arrange(Size)

iris.size
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Total    Size
1          5.1         3.5          1.4         0.2  setosa  10.2 Output1
2          4.9         3.0          1.4         0.2  setosa   9.5 Output1
3          4.7         3.2          1.3         0.2  setosa   9.4 Output1
4          4.6         3.1          1.5         0.2  setosa   9.4 Output1
5          5.0         3.6          1.4         0.2  setosa  10.2 Output1
6          5.4         3.9          1.7         0.4  setosa  11.4 Output1

My actual data set - this is what I'm trying to achieve: I've got six months of historical info that I compare to the current months data. (In seven different columns). If the six months of historical info is blank (zeroes) I have to perform one analysis. If the sum of the six columns is any number greater than zero (there's info there) I perform another analysis. So I've got an ifelse w/apply statement that is very similar to the one shown in the iris data set. BASICALLY - I look at the summation of the columns, and the relationship between them, and perform one of several sets of analysis. — stackinator, Dec 29 '17 at 03:59

score 0 · Answer 3 · answered Dec 29 '17 at 03:36

The error message is caused by the fact that iris.size["Size"] is an object of type data.frame(). This can be confirmed by the str() function:

> str(iris.size["Size"])
'data.frame':   150 obs. of  1 variable:
 $ Size: chr [1:150, 1] "Output1" "Output1" "Output1" "Output1" ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Sepal.Width"
>

Casting the object with as.vector() resolves the problem because the data frame contains 1 column.

dplyr nested ifelse errors - is it vector recycling?

3 Answers3