1

I'm having a hard time with an if_else statement in R/dplyr. My goal is to look in a column for specific Nexus phone models and create a new column that says "android phone" if the nexus values are found, or refers to the device_type column in the same row. I keep getting an error with the false condition of the following code. How can I get it to refer to the other column? Also wondering if there is a way to make the if conditions more concise. newdevice is the column I'm creating. Thanks!

#Correct Nexus issue
df$newdevice <- if_else(df$wurfl_model_name == "Nexus 5" | df$wurfl_model_name == "Nexus 7" | df$wurfl_model_name == "Nexus 6P" | df$wurfl_model_name == "Nexus 6" | df$wurfl_model_name == "Nexus 5X" | df$wurfl_model_name == "Nexus" | df$wurfl_model_name == "Nexus 4", "android phone", df$device_type) 
Tyler
  • 25
  • 5
  • What is the error you are getting? To make it concise you could do a `grep` to just look for `Nexus`. Something like `grepl("Nexus", df$model_name)`. – Anonymous coward Oct 10 '18 at 18:21
  • Warning message: Unknown or uninitialised column: 'device_type'. re: grep, the problem is that there are some nexus devices that are tablets, so i need to specify the specific ones to correct (because an error in another column is causing phones to be mislabeled) – Tyler Oct 10 '18 at 18:24
  • Does device type specify that? You could do the `grep` and `df$device_type != "tablet"` within the `ifelse`. Or like using `%in%` as suggested below. Without seeing your data, it's difficult to tell. Can you post a [minimal reproducible example](https://stackoverflow.com/a/5963610/2359523)? – Anonymous coward Oct 10 '18 at 18:32

3 Answers3

2

I think your issue is that df$device_type is type factor not character, so coercing it to character should solve your problem:

df$device_type <- as.character(df$device_type)

Additionally, you can make your code simpler by using the %in% operator:

df$newdevice <- if_else(
  df$wurfl_model_name %in%
    c(
      "Nexus 5",
      "Nexus 7",
      "Nexus 6P",
      "Nexus 6",
      "Nexus 5X",
      "Nexus",
      "Nexus 4"
    ),
  "android phone",
  df$device_type
)
dave-edison
  • 3,666
  • 7
  • 19
  • you guys are amazing. that solved the problem. one last question re: best practices. i generally find myself needing to modify data in a column, but when i do things in R, I typically create a new column and delete the old one. for something like the above, is that the right way to go, or could i have easily just modified the original device column – Tyler Oct 10 '18 at 18:41
0

@Tyler, I cannot add a comment to the correct answer, but I think a good argument could be made either way. I personally prefer to keep the old column, it's always nice to check to make sure your code ran as planned too. However, if you are using dplyr I would encourage you use mutate to create new variables. That would change DiceBoyT's answer to something like this:

library(tidyverse)

df <- df %>%
      mutate(device_type = as.character(device_type),
             newdevice = if_else(wurfl_model_name %in% 
                                   c("Nexus 5", "Nexus 7", "Nexus 6P",
                                     "Nexus 6", "Nexus 5X", "Nexus", "Nexus 4"),
                                 "android phone", device_type))
Andrew
  • 5,028
  • 2
  • 11
  • 21
0

This could also be solved using case_when from dplyr, which I think is a little neater, and also generalizes to instances where there are more than two outcomes (i.e., an if, else if, and else, instead of just if and else).

library(tidyverse)


df <- df %>%
  mutate(
    device_type = as.character(device_type),
    new_device = case_when(
      wurfl_model_name %in% c("Nexus 5", "Nexus 7", "Nexus 6P", "Nexus 6", "Nexus 5X", "Nexus", "Nexus 4") ~ "android phone",
      TRUE ~ device_type
    )
  )

In the case_when, each row is basically an if statement. If the condition of the first row is met, do that. Else if the condition of the second row is met, do that. The TRUE in the last line is your else, because that always evaluates to true, and thus if the last line of the case_when is reached, that line will be executed. Full documentation for case_when can be found here on the dplyr website.

Jake Thompson
  • 2,591
  • 1
  • 16
  • 32