2

Using the tidytext package, I want to transform my tibble into a one-token-per-document-per-row. I transformed the text column of my tibble from factor to character but I still get the same error.

text_df <- tibble(line = 1:3069, text = text)

My tibble looks like this, with a column as character:

# A tibble: 3,069 x 2
line text$text  
<int> <chr> 

However when I try to apply unnest_tokens:

text_df %>%
  unnest_tokens(word, text$text)

I always get the same error:

Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

What is the issue in my code?

PS: I've looked at different posts on the topic but no luck.

Thank you

UseR10085
  • 7,120
  • 3
  • 24
  • 54
LG3555
  • 41
  • 1
  • 1
  • 3
  • 1
    It's going to be difficult to really help without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). I don't understand how you have a column named `text$text`, since we don't see that being created anywhere – camille Aug 18 '19 at 16:54
  • `tibble:::print.tbl` displays columns of class data frame this way, try `tibble(cars = cars)` – moodymudskipper Aug 19 '19 at 13:36

2 Answers2

4

At least part of the problem is the variable name containing a "$". What your are effectively doing in your code is trying to get the element "text" from the object "text", which is likely the function graphics::text and not subsetable.

Change the name of "text$text" or wrap it in backticks:

text_df %>% 
   unnest_tokens(word, `text$text`)

In general you should avoid using special characters in variable names, because it only leads to errors like this one.

If your problem persists, please provide a minimal reproducible example: How to make a great R reproducible example

shs
  • 3,683
  • 1
  • 6
  • 34
3

Your text column is probably a data frame itself with a single text column :

library(tibble)
library(dplyr,warn.conflicts = FALSE)
library(tidytext)

text <- data.frame(text= c("hello world", "this is me"), stringsAsFactors = FALSE)
text_df <- tibble(line = 1:2, text = text)

text_df
#> # A tibble: 2 x 2
#>    line text$text  
#>   <int> <chr>      
#> 1     1 hello world
#> 2     2 this is me

text_df %>% 
  unnest_tokens(word, text$text)

Error in check_input(x) :

Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

Modify it to extract the text column and proceed :

text_df <- mutate(text_df, text = text$text)
# or if your text is stored as factor
# text_df <- mutate(text_df, text = as.character(text$text))

text_df
#> # A tibble: 2 x 2
#>    line text       
#>   <int> <chr>      
#> 1     1 hello world
#> 2     2 this is me

text_df %>% 
  unnest_tokens(word, text)
#> # A tibble: 5 x 2
#>    line word 
#>   <int> <chr>
#> 1     1 hello
#> 2     1 world
#> 3     2 this 
#> 4     2 is   
#> 5     2 me

It's a good idea to use str(), or sometimes summary(), names() or unclass() to diagnose this sort of issues :

text <- data.frame(text= c("hello world", "this is me"), stringsAsFactors = FALSE)
text_df <- tibble(line = 1:2, text = text)
str(text_df)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    2 obs. of  2 variables:
#>  $ line: int  1 2
#>  $ text:'data.frame':    2 obs. of  1 variable:
#>   ..$ text: chr  "hello world" "this is me"
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167