2

var1 is a character vector

var1 <- c("tax evasion", "all taxes", "payment")

and var2 is another character vector

var2 <- c("bill", "income tax", "sales taxes")

Want to compare var1 and var2 and extract the terms which has a partial word match, for example, the desired answer in this case will be the following character vector:

"tax evasion", "all taxes", "income tax", "sales taxes"

I tried

sapply(var1, grep, var2, ignore.case=T,value=T)

but not getting the desired answer. How can it be done?

Thanks.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
user6633625673888
  • 625
  • 2
  • 7
  • 17

2 Answers2

3

You can do (I use magrittr package for clarity of the code):

library(magrittr)

findIn = function(u, v)
{
    strsplit(u,' ') %>%
        unlist %>%
        sapply(grep, value=T, x=v) %>%
        unlist %>%
        unique
}

unique(c(findIn(var1, var2), findIn(var2, var1)))
#[1] "income tax"  "sales taxes" "tax evasion" "all taxes"
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
  • 2
    in 2 mins I effectively have the time to copy it all and format :) I developped it on my side but you were quicker, did not see your answer when posting. Btw, if two lists have a common sentence, you need unique at the end. – Colonel Beauvel May 24 '15 at 08:37
  • 1
    Yes, you are right, the `unique` is needed at the end. I didn't meant that you copied. I saw a similarity so I commented. – akrun May 24 '15 at 08:38
  • @akrun why did you delete your answer? – user6633625673888 May 24 '15 at 08:44
  • @akrun people also upvote and accept answers that do not use additional packages (eg magrittr). Your answer did not use additional packages. Although Colonel Beauvel answer is useful too. – user6633625673888 May 24 '15 at 08:48
  • 1
    @john If you insist, I will undelete it, though I liked ColonelBeauvel's elegant take on the problem – akrun May 24 '15 at 08:49
1

May be you need

lst1 <- strsplit(var1, ' ')
lst2 <- strsplit(var2, ' ')

indx1 <- sapply(lst1, function(x) any(grepl(paste(unlist(lst2), 
       collapse="|"), x)))
indx2 <- sapply(lst2, function(x) any(grepl(paste(unlist(lst1),
       collapse="|"), x)))
c(var1[indx1], var2[indx2])
#[1] "tax evasion" "all taxes"   "income tax"  "sales taxes"

If there are intersects between var1 and var2, wrap with with unique as @ColonelBeauvel did in his elegant solution.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    Thank you akrun and Colonel Beauvel. Both your answers are elegant, although personally I prefer answers which use less nor none additional packages. – user6633625673888 May 24 '15 at 08:50