1

I need to tokenize the text data as code below, but generate error. How to fix it? Thanks!

library(readr)
europeecondata <- read_csv("C:/Users/lin/Documents/europeecondata.csv")

european_text <- data_frame(line=1:273, text=europeecondata$text)


european_text$text <- gsub("http[^[:space:]]*","",  european_text$text) # For http
european_text$text <- gsub("http[^[:space:]]*","", european_text$text) # For https


data(stop_words)
euro_tokens <- european_text$text %>%
   unnest_tokens(word, text) %>%
   anti_join(stop_words)%>%
   count(word, sort=T)

Output: Error in UseMethod("unnest_tokens_") : no applicable method for 'unnest_tokens_' applied to an object of class "character"

J Lin
  • 113
  • 6
  • [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data; right now we can't run any of your code without any data, and can't see what you're working with – camille Feb 12 '20 at 00:21

1 Answers1

0

unnest_tokens requires the tbl as data.frame. In the OP's code, the column is extracted and passed as vector. Instead, it would be

library(tidytext)
library(dplyr)
european_text %>%
    unnest_tokens(word, text)

According to ?unnest_tokens, the usage is

unnest_tokens(tbl, output, input, token = "words", format = c("text", "man", "latex", "html", "xml"), to_lower = TRUE, drop = TRUE, collapse = NULL, ...)

where

tbl - data.frame

Using a reproducible example

library(janeaustenr)
d <- tibble(txt = prideprejudice)
d$txt %>%
   unnest_tokens(word, txt)

Error in UseMethod("unnest_tokens_") : no applicable method for 'unnest_tokens_' applied to an object of class "character"

Instead, if we do

d %>%
   unnest_tokens(word, txt)
# A tibble: 122,204 x 1
#   word     
#   <chr>    
# 1 pride    
# 2 and      
# 3 prejudice
# 4 by       
# 5 jane     
# 6 austen   
# 7 chapter  
# 8 1        
# 9 it       
#10 is       
# … with 122,194 more rows
akrun
  • 874,273
  • 37
  • 540
  • 662