I have sample dataframe df in R and rd_3 in sparklyr. I want to create visit_category column in spark dataframe . I know we can use Cut function in R to create same column , but how do I create same in sparklyr ?
For reproducible purpose
df<-data.frame(visit_duration=c(12,20,70,100),city=c("X","X","X","X"),visit_category=c("0-15","15-25","25-80","80-120"))
rd_3<-copy_to(sc,df)
I cannot use ifelse statements as number of bins is more than 50 . I used ft_bucketlizer in sparklyr ,but it showed an error as given below
rd_3 %>%
ft_bucketizer("visit_duration", "Visit_Category", splits = c(0, 15, 25, 80 , 120)) %>%
mutate(Visit_Category = factor(Visit_Category, labels = c("0-15","15-25","25-80","80-120")))
this is the error I get
Error: org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input 'AS' expecting {')', ','}(line 1, pos 98)
== SQL ==
SELECT `new_col`, `visit_duration`, FACTOR(`Visit_Category`, ("0-15",
"15-25", "25-80", "80-120") AS "labels") AS `Visit_Category`
In addition: Warning message:
Named arguments ignored for SQL FACTOR