0

I am having trouble with the following scenario. I have a dataframe df that has multi-word strings in var1. I want to keep only the words from var1 if that word is in chr. For example, the first row of var1 has "car tv dog" and I want to delete the word "dog" because it is not in chr.

My dataframe:

id <- c(1,2,3)
var1 <- c("car tv dog","cat water mouse","pen wire fish")
df <- data.frame(id,var1)

Words I want to keep:

chr<-"car aaa bbb ccc ddd qqq www eee rrr pen cat ttt fish tv"

Desired result:

want <- c("car tv","cat","pen fish")
dfWant <- data.frame(id, var1, want) 

Any help will be much appreciated.

DanY
  • 5,920
  • 1
  • 13
  • 33
Denis P
  • 3
  • 2
  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos Sep 06 '18 at 16:27
  • 1
    I don't understand how you want to create the second data frame. What's the pattern? Also, what's `chr` doing? – camille Sep 06 '18 at 16:27
  • This is very unclear. Remember that people who read this question are not immersed in the problem like you are. Explain it as if we didn't already know what you are trying to do. – John Coleman Sep 06 '18 at 16:29
  • Sorry, I should have been more specific. The want column is the intersection of the string chr and Var1. That is, for the first row, car and tv are present in the string chr while dog is not. Therefore they would be in the want column. – Denis P Sep 06 '18 at 16:31
  • I'm usually with you @RichScriven, but since he took to commenting instead of editing (and I think the lack of editing was starting to earn him down-votes), I just went ahead. – DanY Sep 06 '18 at 16:46

1 Answers1

1

Code:

# example data
df <- data.frame(
    id = 1:3,
    var1 = c("car tv dog", "cat water mouse", "pen wire fish"),
    stringsAsFactors = FALSE
)

# strings to search for (save each word as an element of a vector)
chr <- "car aaa bbb ccc ddd qqq www eee rrr pen cat ttt fish tv"
chr_vec <- unique(unlist(strsplit(chr, " ")))

# split var1 into words, check if word is in chr_vec, 
# keep only if true, re-combine into multi-word string
df$result <- unlist(lapply(strsplit(df$var1, " "), function(x) paste(x[x %in% chr_vec], collapse = " ")))

Result:

> df
  id            var1   result
1  1      car tv dog   car tv
2  2 cat water mouse      cat
3  3   pen wire fish pen fish
DanY
  • 5,920
  • 1
  • 13
  • 33