0

I am trying to convert a dgcMatrix to datatable in R using following piece of code:

feats <- as.data.table(as.matrix(dtm_text))

But it throws an error like this:

Error in nchar(collabs) : invalid multibyte string, element 149

Does anyone have the reason for this error or another way to achieve the same??

A small part of the code before the problem line

bow <- itoken(trte_data$Description, preprocessor = tolower ,tokenizer = 
word_tokenizer, ids = trte_data$User_ID)
bow_vocab <- create_vocabulary(bow)
pruned_bow <- prune_vocabulary(bow_vocab, term_count_min = 100)
vovec <- vocab_vectorizer(pruned_bow)
dtm_text <- create_dtm(bow, vovec)
  • It looks like you have an encoding issue. Could you provide a small reproducible example? – nghauran Oct 12 '17 at 09:42
  • Perhaps [this](https://stackoverflow.com/questions/14363085/invalid-multibyte-string-in-read-csv) or [this](https://stackoverflow.com/questions/4993837/r-invalid-multibyte-string) may help – akrun Oct 12 '17 at 09:56
  • @ANG I have added the code just before the problem line – Hartej Singh Kathuria Oct 12 '17 at 10:55
  • Did you have a column called `collabs` in the dgcMatrix ? Try this `feats <- as.data.table(as.matrix(dtm_text), encoding = "UTF-8")` – nghauran Oct 12 '17 at 12:15
  • @ANG nope don't have any such column.Also this solution didn't work.Could the issue be while I am reading the file??I am using fread for that, should I include encoding argument there?? – Hartej Singh Kathuria Oct 13 '17 at 05:01

0 Answers0