I am working on text mining project with R. The file size is over 100 MB. I managed to read the file and did some text processing, however, when I get to the point of removing stop words, RStudio crushed. What would be the best solution, please?
Should I split the file into 2 or 3 files, process them and then merge them again before applying any analytics? anyone has the code to split. I tried several options available online and none of them seems to work.
Here is the code I used. Everything worked smoothly except the removing stop words
# Install
install.packages("tm") # for text mining
install.packages("SnowballC") # for text stemming
install.packages("wordcloud") # word-cloud generator
install.packages("RColorBrewer") # color palettes
# Load
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
library(readr)
doc <- read_csv(file.choose())
docs <- Corpus(VectorSource(doc))
docs
# Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))