I’m a beginner with using R and am currently working on a file with multiple columns. I want to focus on one column (labelled text in the csv file) and create a corpus and then change the text in the text column so that it is all in lower case, has punctuation removed etc.
The code below is what I have so far:
# Import text data
ALL_tweets_df <- read.csv("All_tweets.csv", stringsAsFactors = FALSE)
library(tm)
# View the structure of tweets
str(ALL_tweets_df)
# Print out the number of rows in tweets
nrow(ALL_tweets_df)
# Isolate text from tweets: All_tweets
ALL_tweets_df <- ALL_tweets_df$text
#converts the relevant part of your file into a corpus
mycorpus<-Corpus(VectorSource(ALL_tweets_df$text))
# change to lower case, remove stop words, remove punctuation
mycorpus2 = tm_map(mycorpus, tolower)
mycorpus3 = tm_map(mycorpus2, removeWords, stopwords("english"))
mycorpus4 = tm_map(mycorpus3, removePunctuation)
I’m going wrong where I try to convert the relevant parts of my file to a corpus because it’s saying I have a list of 0 as the value for mycorpus which can’t be right as there are thousands of tweets under the text column in the csv file. Would anyone know how I could amend this so that it works?
Any help would really be appreciated.