0

I have seen examples of using gsub and sapply to remove words from a dataframe.

Is there a solution using map from purrr library

library(purrr)
ID<-c(1,2)
Text_W<-c("I love vegetables, and some fruits","Can we meet tomorrow for a movies, and other fun activities") 
new_tab<-tibble(ID,Text_W)
remove_words<-c("love", "and")

I tried these with no success:

#gsub from base
map_chr(new_tab$Text_W,~paste(gsub(remove_words,"")))


library(stringr)
#
map(new_tab$Text_W,~paste(str_replace_all(remove_words,"")))

Any help will be appreciated.

Roman
  • 17,008
  • 3
  • 36
  • 49
Beginner
  • 262
  • 1
  • 4
  • 12

3 Answers3

1

Not necessary to use map. Simply try

new_tab %>% 
  mutate(Text_New=str_replace_all(Text_W, paste(remove_words,collapse = "|"),""))
# A tibble: 2 x 3
     ID Text_W                                                      Text_New                                                
  <dbl> <chr>                                                       <chr>                                                   
1    1. I love vegetables, and some fruits                          I  vegetables,  some fruits                             
2    2. Can we meet tomorrow for a movies, and other fun activities Can we meet tomorrow for a movies,  other fun activities

Please note that I collapsed the remove_words with the or == | argument using paste(remove_words,collapse = "|").

Roman
  • 17,008
  • 3
  • 36
  • 49
  • Thank you for all your invaluable help. Quick question - if I change remove_words<-c("love", "and") to include "om" remove_words<-c("love", "and", "om"). Do I still use str_replace_all without taking "om" from "some in first element? – Beginner Apr 10 '18 at 10:53
1

You mainly miss the . to refer to the function argument:

> map_chr(new_tab$Text_W, ~gsub("love|and", "", .))
[1] "I  vegetables,  some fruits"                             
[2] "Can we meet tomorrow for a movies,  other fun activities"

Also note the gsub("love|and" instead of gsub(c("love","and").

Edit

If you want to use a vector of words to be removed, instead of typing love|and, do

map_chr(new_tab$Text_W, ~gsub(paste(remove_words, collapse="|"), "", .))
Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225
  • Why use paste? Is there a need – Onyambu Apr 10 '18 at 08:42
  • @StéphaneLaurent - how do I get remove_words to "love|and" programmatically in R. In my real example I am using tm::stopwords("english"). How will I get this list to "x|y|z|.." format. Thank you – Beginner Apr 10 '18 at 11:22
  • @Beginner Not sure to understand. Maybe `paste(remove_words,collapse = "|")`, as in @Jimbou's anwser. – Stéphane Laurent Apr 10 '18 at 11:24
  • @StéphaneLaurent perfect - thank you. One of the words in tm::stopwords("english") is **or**. If remove_words was updated to include "or" ... remove_words<-c("love", "and", "or"). Using str_replace_all and gsub seems to remove "or" from tomorrow as well. What function can I use to ensure this does not happen. Thank you – Beginner Apr 10 '18 at 11:34
  • @Beginner I see what you mean. Maybe simpy add spaces: ` remove_words<-c(" love ", " and ", " or ")`. – Stéphane Laurent Apr 10 '18 at 11:39
  • @Beginner like this: `paste0(" ", tm::stopwords("english"), " ")` – Stéphane Laurent Apr 10 '18 at 11:42
  • or `paste0("\\b", tm::stopwords("english"), "\\b")` – Stéphane Laurent Apr 10 '18 at 11:45
1

I would do it one of these ways, would not use purrr for this one

library(purrr)
library(dplyr)
library(stringr)

ID<-c(1,2)
Text_W<-c("I love vegetables, and some fruits","Can we meet tomorrow for a movies, and other fun activities") 
new_tab<-tibble(ID,Text_W)
remove_words<-c("love", "and")

# This is basic, if you are only doing it for one column, see Jimbou's note on collapse
new_tab %>% 
  mutate(Text_W = str_replace_all(Text_W, paste(remove_words,collapse = "|"),""))

# This is more scalable, as you can put other columns in the `vars()` method
new_tab %>% 
  mutate_at(vars(Text_W), str_replace_all, paste(remove_words, collapse = "|"), "")

# This is is scalable, but uses base R in case I didn't feel like having to load stringr
new_tab %>% 
  mutate_at(vars(Text_W), sub, 
            pattern = paste(remove_words, collapse = "|"), 
            replacement = "")
Zafar
  • 1,897
  • 15
  • 33