-5

I have managed to retrieve a text file but i want to remove different words. I have gone to read.table and have no clue how to use it to help me remove certain words. I have got 300 words and these are some of them. How can remove all these words using the R console? I have two files, one is sk.text which is a whole document and the other one is bash.txt that has got just words, so i want to remove all the words in sk.text that match the words given in bash.text.

 with
 within
 without
 work
 worked
 working
 works
 would
Mr nerd
  • 91
  • 1
  • 12
  • So you want to read a file in, remove certain words, and write a file out? – tumultous_rooster Nov 16 '15 at 19:13
  • 3
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Nov 16 '15 at 19:15
  • Yeah, i have two files, one is sk.text which is a whole document and the other one is bash.txt that has got just words, so i want to remove all the words in sk.text that match the words given in bash.text. – Mr nerd Nov 16 '15 at 19:19
  • 2
    Please edit the question. Do not put extra info in the comments. – Jaap Nov 16 '15 at 19:20

1 Answers1

1

A simple way would be to use

gsub(paste0('\\b',
            YOURVECTOROFWORDSTOREMOVE,
            '\\b', collapse = '|'),'',YOURSTRING)

which replaces every occurence of the words in the vector surrounded by either end/beginning characters or whitespace with a single space.

but you might want to look at the tm package and work with a corpus object if you have many files like this. there you can remove the words you like simply with

tm_map(YOURCORPUS, removeWords, YOURVECTOROFWORDSTOREMOVE) 
OganM
  • 2,543
  • 16
  • 33