-1

I have a data frame (DF) like this:

   word
1  vet clinic New York 
2  super haircut Alabama 
3  best deal on dog drugs
4  doggy medicine Texas
5  cat healthcare
6  lizards that don't lie

I am trying to get the resulting data frame (only remove the geo names)

  word 
1 vet clinic
2 super haircut
3 best deal on dog drugs 
4 doggy medicine
5 cat healthcare
6 lizards that don't lie

The following does not keep the remaining words after the geo name has been removed.

vec <- # vector of geo names
DF <-DF[!grepl(vec,DF$word),]
Cybernetic
  • 12,628
  • 16
  • 93
  • 132

3 Answers3

2

Using @Ari's variables and data frame, a vectorized method could use Reduce:

vec = c("New York", "Texas", "Alabama")
word = c("vet clinic New York", "super haircut Alabama", "best deal on dog drugs", "doggy medicine Texas", "cat healthcare", "lizards that don't lie")
df = data.frame(word=word)
df$word = as.character(df$word)

Reduce(function(a, b) gsub(b,"", a, fixed=T), vec, df$word)

[1] "vet clinic "            "super haircut "         "best deal on dog drugs" "doggy medicine "       
[5] "cat healthcare"         "lizards that don't lie"
lawyeR
  • 7,488
  • 5
  • 33
  • 63
1

As Henrik mentioned, it would have been helpful if you submitted a reproducible example along with your post. I will do so here:

vec = c("New York", "Texas", "Alabama")
word = c("vet clinic New York", "super haircut Alabama", "best deal on dog drugs", "doggy medicine Texas", "cat healthcare", "lizards that don't lie")
df = data.frame(word=word)
df$word = as.character(df$word)
df

                    word
1    vet clinic New York
2  super haircut Alabama
3 best deal on dog drugs
4   doggy medicine Texas
5         cat healthcare
6 lizards that don't lie

Generally speaking R gurus prefer vectorization over for loops. But in this case I found a nested for loop and the stringr package to be the easiest way to solve this problem.

library(stringr)
for(i in 1:nrow(df))
{
  for (j in 1:length(vec))
  {
    df[i, "word"] = str_replace_all(df[i, "word"], vec[j], "")
  }
}
df

                word
1            vet clinic 
2         super haircut 
3 best deal on dog drugs
4        doggy medicine 
5         cat healthcare
6 lizards that don't lie

I believe that this code gives you the result that you were looking for.

Community
  • 1
  • 1
Ari
  • 1,819
  • 14
  • 22
1

Using @Ari's example,

  library(stringr) 
  df$word <- str_trim(gsub(paste(vec,collapse="|"),"", df$word))
  df$word
 #[1] "vet clinic"             "super haircut"          "best deal on dog drugs"
 #[4] "doggy medicine"         "cat healthcare"         "lizards that don't lie"
akrun
  • 874,273
  • 37
  • 540
  • 662