I am fairly new to R but I am having a problem with part of text pre-processing and cleaning before topic modelling. I am trying to Tokenise text to turn each document into a list of words- punctuation is removed as part of this process - column is called text
tokens <- text_input %>% unnest_tokens(words, text)
but I keep getting the error message
Error in UseMethod("unnest_tokens_") :
no applicable method for 'unnest_tokens_' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')"
My text data is currently
text <chr> "mr smiths tenant called for support "
...
I need each document to be turned into a list of words so spell checking etc can be completed and then topic modelling
Code already tried
Basic dataframe called input
and then text_input
Database: spark_connection
$ lines <chr> " mr smiths tenant called for support "
# set the name of the column with your source text
text_col <- "lines"
## Basic cleaning
text_input <- input %>%
filter(!is.na(!!as.name(text_col))) %>%
mutate(text = trimws(!!as.name(text_col)))%>%
mutate(text = tolower(text))
## Tokenise Text
## Turns each document into a list of words; punctuation is removed as part of this process
tokens <- text_input %>% unnest_tokens(words, text)
Error in UseMethod("unnest_tokens_") :
no applicable method for 'unnest_tokens_' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')"