0

I am looking to use the tm package to make changes to the columns of a dataframe i.e. I would like to use the content_transformer, removePunctuation etc. functions to be applied on the columns of a dataframe.

For example using the below dataframe

df <- data.frame(a=c("I love TEXTMINING","Here I GO, Again!!"))

I would like to us the content_transformer to make the df$a into lower cases and removePunctuation to remove the punctuation such that the output would look like the below

                  a
1 i love textmining
2   here i go again

Is there a way to perform the above specifically using the functions in the tm package?

Molia
  • 311
  • 2
  • 17
  • You can try a regex `df$a <- gsub("[[:punct:]]+", "", tolower(df$a))` or with `tm` `tolower(removePunctuation(as.character(df$a)))#[1] "i love textmining" "here i go again"` – akrun Jan 31 '18 at 14:57

1 Answers1

2

To use the tm package here is an example:

df <- data.frame(a=c("I love TEXTMINING","Here I GO, Again!!"))

library(tm)
corpus<-Corpus(VectorSource(df$a))
corpus<-tm_map(corpus, removeNumbers)
corpus<-tm_map(corpus, content_transformer(tolower))
#corpus<-tm_map(corpus, removeWords, stopwords('english'))
corpus<-tm_map(corpus, removePunctuation)

answer<-unlist(as.list(corpus))
answer
Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • Thanks. What if I wanted to append the answer to a new column of df. How would that work? – Molia Jan 31 '18 at 15:47
  • I found that answer here: https://stackoverflow.com/questions/24703920/r-tm-package-vcorpus-error-in-converting-corpus-to-data-frame – Molia Jan 31 '18 at 15:51