0

I am fairly new to R but I am having a problem with part of text pre-processing and cleaning before topic modelling. I am trying to Tokenise text to turn each document into a list of words- punctuation is removed as part of this process - column is called text

tokens <- text_input %>% unnest_tokens(words, text)

but I keep getting the error message

Error in UseMethod("unnest_tokens_") : 
  no applicable method for 'unnest_tokens_' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')"

My text data is currently

text     <chr> "mr smiths tenant called for support "
...

I need each document to be turned into a list of words so spell checking etc can be completed and then topic modelling

Code already tried
Basic dataframe called input and then text_input
Database: spark_connection

$ lines    <chr> "  mr smiths tenant called for support    "

# set the name of the column with your source text

text_col <- "lines"

## Basic cleaning

text_input <- input %>%   
 filter(!is.na(!!as.name(text_col))) %>%  
 mutate(text = trimws(!!as.name(text_col)))%>%
 mutate(text = tolower(text))

## Tokenise Text
## Turns each document into a list of words; punctuation is removed as part of this process

tokens <- text_input %>% unnest_tokens(words, text)

Error in UseMethod("unnest_tokens_") : 
  no applicable method for 'unnest_tokens_' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')"
Hayden Y.
  • 448
  • 2
  • 8
  • 1
    Hello, Can you please share the whole code, please look at: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Chelmy88 Aug 26 '19 at 14:26
  • try `tokens <- text_input %>% unnest_tokens(word, text, token = "text_input")` – Manuel F Aug 26 '19 at 14:28
  • this is what I get Basic Df called input and then text_input Database: spark_connection $ lines " mr smiths tenant called for support $ text "mr smiths tenant called for support # set the name of the column with your source text text_col <- "lines" ## Basic cleaning ```{r} text_input <- input %>% filter(!is.na(!!as.name(text_col))) %>% mutate(text = trimws(!!as.name(text_col)))%>% mutate(text = tolower(text)) – dazedandconfused Aug 26 '19 at 14:38
  • ctd ## Tokenise Text Turns each document into a list of words- punctuation is removed as part of this process ```{r} tokens <- text_input %>% unnest_tokens(words, text) ``` – dazedandconfused Aug 26 '19 at 14:38
  • 1
    It's much easier to follow your code if you [edit] the question instead of putting it unformatted in comments – camille Aug 26 '19 at 14:39

0 Answers0