0

I am trying to do word analysis on some data in R. I imported one column of data that was text responses from a survey into R using read.csv. I named one of the columns "text" . This code was working fine a few days ago and now it suddenly is giving me an error. This is the code I am entering:

library(dplyr)

library(tidytext)

A1<-read.csv("/Users/Laura/Documents/A1.csv")

colnames(A1)= c("text")

A1<-A1%>%unnest_tokens(word, text)

The error I am getting now says this:

Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

My data didn't change, the code I'm using didn't change. :( I don't really understand why this is happening and am fairly new to R... Is there another package I need to load that maybe I had loaded before and didn't realize it?

Here is a link to my data: https://www.dropbox.com/s/amg12jp9qx98slz/A1.csv?dl=0

Thanks for your help

1 Answers1

0

I just used the data you provided on Dropbox and the following code seems to be running for me with no problems. Maybe try reading it in not as a CSV?

library(dplyr)
library(tidytext)
library(readr)

A1 <- data_frame(text = read_lines("~/Downloads/A1.csv")) %>%
    mutate(line = row_number())

tidyA1 <- A1 %>%
    unnest_tokens(word, text)

tidyA1
#> # A tibble: 332 × 2
#>     line  word
#>    <int> <chr>
#> 1      1 empty
#> 2      1  your
#> 3      1   cup
#> 4      1  step
#> 5      1    on
#> 6      1   the
#> 7      1  line
#> 8      2  safe
#> 9      2 space
#> 10     3 empty
#> # ... with 322 more rows
Julia Silge
  • 10,848
  • 2
  • 40
  • 48
  • So I figured out the problem is it is reading in the data as a factor instead of character.. So if I put stringsAsFactors=False when doing read.csv it is working now. But I am still confused why I did not need that step before and now suddenly need to do it that way... Any ideas? – Laura Albrecht Apr 07 '17 at 20:10
  • The code you used works for me now too. Thanks. I'm still not sure why I ran into this problem to begin with but at least I have some ways around it! – Laura Albrecht Apr 07 '17 at 20:15
  • Ah, `stringsAsFactors`! It will [get you every time](http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/)! I highly recommend reading data in using the readr library because it handles this more consistently and unexpected factor levels will not come back to unexpectedly bite you. – Julia Silge Apr 07 '17 at 20:17