I'm guessing that the technique for this is similar to taking the first N characters from any dataframe, regardless of if it is a corpus or not.
My attempt:
create.greetings <- function(corpus, create_df = FALSE) {
for(i in length(Charlotte.corpus.raw)) {
Doc1<-Charlotte.corpus.raw[i]
Word1<-Doc1[1:25]
Greetings[i]<-Word1
}
return(VCorpus)
}
Where Greetings
begins as a corpus with n=6. I couldn't figure out how to make a null corpus, or a corpus of large enough characters. I have a corpus of 200 documents here (Charlotte.corpus.raw
). Unlike vectors (and by extension, dataframes), there doesn't seem to be a easy way to create null corpora.
Part of the problem is that R doesn't seem to recognize the class of "document". It only recognizes corpus. That is, that to R, a single document is a corpus of n=1.
Reproducable Sample: You will need the 'tm' and 'dplyr' and 'NLP' packages as well as more common R packages
read.corpus <- function(directory, pattern = "", to.lower = TRUE) {
corpus <- DirSource(directory = directory, pattern = pattern) %>%
VCorpus # Read files and create `VCorpus` object
if(to.lower == TRUE) corpus <- # Lowercase text
tm_map(corpus,
content_transformer(tolower))
return(corpus)
}
Then run the function for any directory you have with a few txt documents, then you have a corpus to work with. Then replace Charlotte.corpus.raw from above with whatever you name your corpus as.