How do you print a small sample, or first line, of a corpus in R using the tm package? I have a very large corpus ( > 1 GB) and am doing some text cleaning. I would like to test as I apply cleaning procedures. Printing just the first line, or first few lines of a corpus would be ideal.
# Load Libraries
library(tm)
# Read in Corpus
corp <- SimpleCorpus( DirSource(
"C:/TextDocument"))
# Remove puncuation
corp <- removePunctuation(corp,
preserve_intra_word_contractions = TRUE,
preserve_intra_word_dashes = TRUE)
I have tried accessing the corpus several ways:
# Print first line of first element of corpus
corp[[1]][[1]]
# Print first line using 'content' element of corpus
corp[[1]]$content[[1]]
Both of these result in very long run times without the desired output.
The crude corpus in the tm package can be used for example purposes.
data("crude")