0

I have to mutate the dataframe and add column based on a certain word Health in a column. This code runs fine when I run it in R with dplyr, but it doesn't run when I use sparklyr. This is the first time I'm using sparklyr. How can I fix this?

bmk_tbl %>% add_column(healthcare = case_when(
                                          grepl("Health", .$OrganizationType) ~ 1, 
                                          TRUE ~ 0), .after = "OrganizationType")

I get the following error, and I don't know how to fix it

Error in if (nrow(df) != nrow(.data)) { : missing value where TRUE/FALSE needed

I'm not sure what to try so I tried doing something like this:

bmk_tbl %>% add_column(healthcare = case_when(
                                          (.$OrganizationType %in% c("Health") ~ 1), 
                                          TRUE ~ 0), .after = "OrganizationType")

but this won't work because there's no single word Health in the database. It's always mixed with some other multiple words.

10465355
  • 4,481
  • 2
  • 20
  • 44
user1828605
  • 1,723
  • 1
  • 24
  • 63

1 Answers1

1

You have two unrelated problems here:

  • Mutating primitives like add_column are applicable only to data.frames, and tbl_spark is not a one. This accounts for the following error:

    Error in if (nrow(df) != nrow(.data)) { : missing value where TRUE/FALSE needed
    

    In fact you should also see accompanying warning on the first invocation

    In addition: Warning message:
    `.data` must be a data frame in `add_column()`.
    

    The right function to use here is mutate.

  • grepl is not translated into SQL primitive. Instead you should use grepl

Combined

data <- copy_to(sc, iris, overwrite=TRUE)

data %>% 
  mutate(match = case_when(
    Species %rlike% "tos" ~ 1,
    TRUE ~ 0
  ))

or simply

data %>%
    mutate(match = as.numeric(Species %rlike% "tos"))
10465355
  • 4,481
  • 2
  • 20
  • 44
  • Thank you.. In `instead you should use grepl` did you mean rlike? Is rlike a package or something that comes with sparklyr? I remember trying it but the error message I received was something related to `unknown function rlike` – user1828605 Mar 14 '20 at 17:43
  • 1
    Any functions you use here are not R objects. There are names translated (if possible) to underlying SQL dialect. The syntax you need is exactly as shown. – 10465355 Mar 17 '20 at 00:28