How do I tokenize a text file

Asked Aug 24 '18 at 10:17

Active Aug 24 '18 at 12:47

Viewed 207 times

I am trying to perform text analytics on the following text file. The code I have written to tokenize this text after importing it is:

my_data <- read.delim("5KjlUO.txt")
library(tokenizers)
library(SnowballC)
tokenize_words(my_data$ACT.I)
tokenize_words(my_data)

I am getting the following error:

Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

Can someone help me resolve this issue.

edited Aug 24 '18 at 12:47

Terru_theTerror

asked Aug 24 '18 at 10:17

Anmol

1

Please tag your programming language ([tag:r]?), so people who have experience with it will be alerted to your question; and also note that we prefer any text to be actually text, not images. – Amadan Aug 24 '18 at 10:22
2

Make your text available outside of kaggle. Not everyone has an account / wants to go through the hassle of downloading the file from kaggle. Check this post on how to make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – phiver Aug 24 '18 at 11:40
1

What does `class(my_data$ACT.I)` return? Are you sure it's a character vector. If you used `read.delim()` like that it's probably a factor variable. Try tokenize_words(as.character(my_data$ACT.I))`. – MrFlick Aug 24 '18 at 14:58

0 Answers0