1

i am new to R programming and wrote a program for removing stopwords

require(tm)
data<-read.csv('remm.corp')
print(data)

path<-"/home/cloudera/saicharan/R/text.txt"
aaa<-readLines(path)

bbb<-Corpus(VectorSource(aaa))
#inspect(bbb)

bbb<-tm_map(bbb,removeWords,stopwords("english"))
write.csv(as.character(bbb[[1]]),'e.csv')

i tried writing the data to file but could only write a single line... how should i modify the code to print multiple lines? please help

MrFlick
  • 195,160
  • 17
  • 277
  • 295
sk79
  • 35
  • 10
  • 2
    What would be on these "multiple lines"? It would be better to give a minimal [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample data (that isn't only on your local computer) and clearly show the expected output. – MrFlick Mar 22 '16 at 12:19

1 Answers1

0

One way to save the corpus is to first convert into a data frame and then save it as a csv file. Since you didn't provide sample text, i created some reproducible text. Below code first creates corpus from the sample text. Then the stop words are removed. The corpus structure is a list and the text is saved in the content element. The code extracts just the text and creates a data frame. Finally we save the data frame.

Code:

#Reproducible data - Quotes from  As You Like It by  William Shakespeare
SampleText <- c("All the world's a stage,And all the men and women merely players;They have their exits and their entrances;And one man in his time plays many parts,
His acts being seven ages.",
          "Men have died from time to time, and worms have eaten them, but not for love.",
          "Love is merely a madness.")

library(tm)
mycorpus <-  Corpus(VectorSource(SampleText)) # Corpus creation
mycorpus <-tm_map(mycorpus,removeWords,stopwords("english"))

mycorpus_dataframe <- data.frame(text=unlist(sapply(mycorpus, `[`, "content")), 
                      stringsAsFactors=F)

write.csv(mycorpus_dataframe,'mycorpus_dataframe.csv', row.names=FALSE)

Output:

> print(mycorpus_dataframe , row.names=FALSE)
                                                                                                                                     text
 All  world's  stage,And   men  women merely players;They   exits   entrances;And one man   time plays many parts,\nHis acts  seven ages.
                                                                                          Men  died  time  time,  worms  eaten ,    love.
                                                                                                                   Love  merely  madness.

> 
amitkb3
  • 303
  • 4
  • 14