-1

I am relatively new to R and use it mostly for text analysis at the moment. In the code below I am trying to find repetition in certain words.

tableinter <- intersect(innsong, Inncorpus[[1]])
inntable <- table(innsong)
repwords <- list()
notrepwords <- list()
for(i in length(tableinter)){
  if(inntable[tableinter[i]] > 1){
    repwords[[i]] <- tableinter[[i]]
    return(repwords)
  } else{
    notrepwords[[i]] <- tableinter[[i]]
  }
}

My end goal is to have two lists from the words that intersect from innsong and inncorpus[[1]]. One list, repwords, will have the words from inncorpus[[1]] that both intersect with innsong and have a frequency of >1. The other list is those words that only have one occurrence in the text.

alistaire
  • 42,459
  • 4
  • 77
  • 117
  • 3
    Try to give a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) along with the expected output. – Ronak Shah Dec 08 '17 at 02:03

1 Answers1

0

A much more versatile way to analyse occurrences of words in documents is via a document-feature matrix (or document-term matrix).

I recommend the package quanteda.

require("quanteda")
require("magrittr")

df_matrix <- c(innsong, Inncorpus[[1]]) %>%
    dfm(tolower = TRUE, stem = FALSE, remove_numbers = TRUE,
        remove_punct = TRUE, remove_symbols = TRUE)

This will produce a matrix where each row is a document and each column is a different word. Each value represents the number of occurrences of each word in each document. You will arrive with two rows. You can then evaluate the matrix to find your answers.

This is much more versatile of an approach in my opinion. It gives you the option to stem words and remove stopwords if you wish.

Jamie
  • 174
  • 1
  • 10