Text mining: How to solve this error message during the tokenization?

Question

suppressWarnings(library(RWeka))
tokenizer <- function(x) 
NGramTokenizer(x, Weka_control(min = 2, max = 2))


amzn_c_tdm <- TermDocumentMatrix(
  amzn_cons_corp, 
  control = list(tokenize = tokenizer)
)

below code is the traceback of the error during the tokenization Could you kindly help that why there is an error and how to solve it Really appreciate

traceback code

'Error in .jcall("RWekaInterfaces", "[S", "tokenize", .jcast(tokenizer, : java.lang.NullPointerException'

10.

stop(structure(list(message = "java.lang.NullPointerException", 
call = .jcall("RWekaInterfaces", "[S", "tokenize", .jcast(tokenizer, "weka/core/tokenizers/Tokenizer"), 
.jarray(as.character(control)), .jarray(as.character(x))), 
jobj = <S4 object of class structure("jobjRef", package = "rJava")>), .Names = c("message", ...

9.

.jcheck()

8.

.jcall("RWekaInterfaces", "[S", "tokenize", .jcast(tokenizer, "weka/core/tokenizers/Tokenizer"), 
.jarray(as.character(control)), .jarray(as.character(x)))

7.

NGramTokenizer(x, Weka_control(min = 1, max = 2))

6.

.tokenize(doc)

5.

FUN(X[[i]], ...)

4.

 lapply(X, FUN, ...)

3.

mclapply(unname(content(x)), termFreq, control)

2.

TermDocumentMatrix.VCorpus(amzn_cons_corp, control = list(tokenize = function(x) NGramTokenizer(x, Weka_control(min = 1, max = 2)))

1.

TermDocumentMatrix(amzn_cons_corp, control = list(tokenize = function(x) NGramTokenizer(x, Weka_control(min = 1, max = 2))))

You may want to consider updating your question with a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — Roman Luštrik, Dec 31 '16 at 08:35

Text mining: How to solve this error message during the tokenization?

0 Answers0