why I can't tokenize the text data

Question

I need to tokenize the text data as code below, but generate error. How to fix it? Thanks!

library(readr)
europeecondata <- read_csv("C:/Users/lin/Documents/europeecondata.csv")

european_text <- data_frame(line=1:273, text=europeecondata$text)


european_text$text <- gsub("http[^[:space:]]*","",  european_text$text) # For http
european_text$text <- gsub("http[^[:space:]]*","", european_text$text) # For https


data(stop_words)
euro_tokens <- european_text$text %>%
   unnest_tokens(word, text) %>%
   anti_join(stop_words)%>%
   count(word, sort=T)

Output: Error in UseMethod("unnest_tokens_") : no applicable method for 'unnest_tokens_' applied to an object of class "character"

[See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data; right now we can't run any of your code without any data, and can't see what you're working with — camille, Feb 12 '20 at 00:21

akrun · Accepted Answer · 2020-02-12T00:00:48.570

unnest_tokens requires the tbl as data.frame. In the OP's code, the column is extracted and passed as vector. Instead, it would be

library(tidytext)
library(dplyr)
european_text %>%
    unnest_tokens(word, text)

According to ?unnest_tokens, the usage is

unnest_tokens(tbl, output, input, token = "words", format = c("text", "man", "latex", "html", "xml"), to_lower = TRUE, drop = TRUE, collapse = NULL, ...)

where

tbl - data.frame

Using a reproducible example

library(janeaustenr)
d <- tibble(txt = prideprejudice)
d$txt %>%
   unnest_tokens(word, txt)

Error in UseMethod("unnest_tokens_") : no applicable method for 'unnest_tokens_' applied to an object of class "character"

Instead, if we do

d %>%
   unnest_tokens(word, txt)
# A tibble: 122,204 x 1
#   word     
#   <chr>    
# 1 pride    
# 2 and      
# 3 prejudice
# 4 by       
# 5 jane     
# 6 austen   
# 7 chapter  
# 8 1        
# 9 it       
#10 is       
# … with 122,194 more rows

why I can't tokenize the text data

1 Answers1