1

I am working with tidytext. When I command unnest_tokens. R returns the error

Please supply column name

How can I solve this error?

library(tidytext)
library(tm)
library(dplyr)
library(stats)
library(base)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
  #Build a corpus: a collection of statements
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
f <-Corpus(DirSource("C:/Users/Boon/Desktop/Dissertation/F"))
doc_dir <- "C:/Users/Boon/Desktop/Dis/F/f.csv"
doc <- read.csv(file_loc, header = TRUE)
docs<- Corpus(DataframeSource(doc))
dtm <- DocumentTermMatrix(docs)
text_df<-data_frame(line=1:115,docs=docs)

#This is the output from the code above,which is fine!: 
# text_df
# A tibble: 115 x 2
#line          docs
#<int> <S3: VCorpus>
# 1      1 <S3: VCorpus>
#2      2 <S3: VCorpus>
#3      3 <S3: VCorpus>
#4      4 <S3: VCorpus>
#5      5 <S3: VCorpus>
#6      6 <S3: VCorpus>
#7      7 <S3: VCorpus>
#8      8 <S3: VCorpus>
#9      9 <S3: VCorpus>
#10    10 <S3: VCorpus>
# ... with 105 more rows

unnest_tokens(word, docs)

# Error: Please supply column name
James Z
  • 12,209
  • 10
  • 24
  • 44
SChatcha
  • 129
  • 1
  • 3
  • 10
  • 2
    http://stackoverflow.com/help/mcve – Hack-R Jul 20 '17 at 16:46
  • you need to reference the data with the first argument, like this `unnest_tokens(tib = text_df, output = words, input = docs)` – Nate Jul 20 '17 at 17:00
  • Dear Nate, Thank you very much for your help. It seems working. However, it produces some errors as follows – SChatcha Jul 20 '17 at 20:26
  • Error in unnest_tokens_(tbl, output_col, input_col, token = token, to_lower = to_lower, : unnest_tokens expects all columns of input to be atomic vectors (not lists) – SChatcha Jul 20 '17 at 20:26
  • This happens because your tibble contains Corpus in the *docs* column, so it's treated as a list when using `unnest_tokens`. As the error message says, your column docs needs to be an atomic vector. – Juan Bosco Jul 20 '17 at 21:15
  • Thank you Juan, How can I convert my column of docs into an atomic vector? – SChatcha Jul 20 '17 at 21:29
  • Anyway, thank you! – SChatcha Aug 04 '17 at 11:16

1 Answers1

2

If you want to convert your text data to a tidy format, you do not need to transform it to a corpus or a document term matrix or anything first. That is one of the main ideas behind using a tidy data format for text; you don't use those other formats, unless you need to for modeling.

You just put the raw text into a data frame, then use unnest_tokens() to tidy it. (I am making some assumptions here about what your CSV looks like; it would be more helpful to post a reproducible example next time.)

library(dplyr)

docs <- data_frame(line = 1:4,
                   document = c("This is an excellent document.",
                                "Wow, what a great set of words!",
                                "Once upon a time...",
                                "Happy birthday!"))

docs
#> # A tibble: 4 x 2
#>    line                        document
#>   <int>                           <chr>
#> 1     1  This is an excellent document.
#> 2     2 Wow, what a great set of words!
#> 3     3             Once upon a time...
#> 4     4                 Happy birthday!

library(tidytext)

docs %>%
    unnest_tokens(word, document)
#> # A tibble: 18 x 2
#>     line      word
#>    <int>     <chr>
#>  1     1      this
#>  2     1        is
#>  3     1        an
#>  4     1 excellent
#>  5     1  document
#>  6     2       wow
#>  7     2      what
#>  8     2         a
#>  9     2     great
#> 10     2       set
#> 11     2        of
#> 12     2     words
#> 13     3      once
#> 14     3      upon
#> 15     3         a
#> 16     3      time
#> 17     4     happy
#> 18     4  birthday
Julia Silge
  • 10,848
  • 2
  • 40
  • 48
  • If you do actually have your data in a document term matrix already (from tm, for example), then what you want to do is [`tidy()`](http://tidytextmining.com/dtm.html) it, not use `unnest_tokens()`. – Julia Silge Jul 21 '17 at 18:49
  • Thank you very much Julia :) – SChatcha Aug 04 '17 at 11:07