A reproducible example (adapted from @forestfanjoe's answer):
library(dplyr)
library(sparklyr)
sc <- spark_connect(master = "local")
df <- data.frame(id = 1:100, PaymentHistory = runif(n = 100, min = -1, max = 2))
df <- copy_to(sc, df, "payment")
> head(df)
# Source: spark<?> [?? x 2]
id PaymentHistory
* <int> <dbl>
1 1 -0.138
2 2 -0.249
3 3 -0.805
4 4 1.30
5 5 1.54
6 6 0.936
fix_PaymentHistory <- function(df){df %>% dplyr::mutate(PaymentHistory = dplyr::if_else(PaymentHistory < 0, 0, dplyr::if_else(PaymentHistory > 1,1, PaymentHistory)))}
df %>% fix_PaymentHistory
The error is:
Error in dplyr::if_else(PaymentHistory < 0, 0, dplyr::if_else(PaymentHistory > :
object 'PaymentHistory' not found
I'm using the scope operator because I'm afraid that the name in dplyr
will clash with some of the user-defined code. Note that PaymentHistory
is a column variable in df
.
The same error is not present when running the following code:
fix_PaymentHistory <- function(df){
df %>% mutate(PaymentHistory = if_else(PaymentHistory < 0, 0,if_else(PaymentHistory > 1,1, PaymentHistory)))
}
> df %>% fix_PaymentHistory
# Source: spark<?> [?? x 2]
id PaymentHistory
* <int> <dbl>
1 1 0
2 2 0
3 3 0
4 4 1
5 5 1
6 6 0.936
7 7 0
8 8 0.716
9 9 0
10 10 0.0831
# ... with more rows