Creating new column with intersecting words from two other columns in R

Question

I am looking to create a new column from the intersecting words from two other columns containing strings:

sometext1 <- c('this is a text entry','here is another text entry','something else')
sometext2 <- c('text entry','text entry','no match here')
texts <- data.frame(sometext1=sometext1, sometext2=sometext2,stringsAsFactors=F)

This is my attempt that didn't produce any match:

texts$common <- paste(Reduce(intersect, list(strsplit(texts$sometext1,' '), strsplit(texts$sometext2,' '))), sep=" ", collapse=" ")

texts$common should look something like this:

1     'text entry'
2     'text entry'
3     ''

Thanks!!

BTW, you could avoid the need to convert `sometext1` and `sometext2` to character by using argument `stringsAsFactors=F` in the `data.frame` command. — Marat Talipov, Feb 03 '15 at 18:43
Also, did you check this link: http://stackoverflow.com/questions/16196327/find-common-substrings-between-two-character-variables ? — Marat Talipov, Feb 03 '15 at 18:45
A three-step approach with base R would be: `x <- lapply(texts, strsplit, " "); x <- Map(intersect, x[[1]], x[[2]]); texts$common <- sapply(x, paste0, collapse = " ")` — talat, Feb 03 '15 at 18:46
@docendo discimus, yes, that's what I was looking for. Can you get the results back into the data frame column texts$common and make it an answer so I can check it? — amunategui, Feb 03 '15 at 18:51
@amunategui, it should already be back in the data.frame after running those three commands. Will post as answer — talat, Feb 03 '15 at 18:55
@docendodiscimus: you're handle choice speaks eloquently to the summum. bonum. Vinimus, vidicus, vcodeRus — lawyeR, Feb 03 '15 at 21:14
@lawyeR, you got it! I should add that to my "about me" section :D — talat, Feb 03 '15 at 21:17
@docendodiscimus: I wish I knew a fraction of what you know so I could help others with R like you do. And, the garbled Julius Caesar may be completely wrong, I should note, but it struck me as funny. Two years of Latin 48 years ago starts to wear off on the declensions. — lawyeR, Feb 03 '15 at 21:20
@lawyeR, I started learning R little over a year ago and most of what I know by now is from following SO questions and trying to answer them. You are already active here so you'll quickly learn more functions, I'm sure :-) This is actually also what my user name is supposed to say - learning by "teaching" (=answering). — talat, Feb 03 '15 at 21:26
@docendodiscimus: it is frustrating to see a question that I could tackle, but Bonded Dust, Richard Scriven, Mr. Flick, or akrun (or all four of them + you) have answered it already. Naturally, the OP wants an answer ASAP. I lack the discipline not to peek at answers. Perhaps SO needs a "Blank out the Answer and Let Me Think on My Own" button. It is seductive to get caught up in gaining rep. — lawyeR, Feb 03 '15 at 21:48

score 3 · Accepted Answer · answered Feb 03 '15 at 19:01

Starting from this data.frame:

> texts
#                   sometext1     sometext2
#1       this is a text entry    text entry
#2 here is another text entry    text entry
#3             something else no match here

You could use the following approach. Start by splitting the entries in each columns rows by spaces, using lapply:

x <- lapply(texts, strsplit, " ")

Then, use Map to apply intersect to the corresponding sub-elements of the first element in x (x[[1]]) - representing the first column in texts - and the second element in x (x[[2]]) - representing the second column in texts:

x <- Map(intersect, x[[1]], x[[2]])

Finally, use sapply to run through the list and paste/collapse the elements together and write them into the new column:

texts$common <- sapply(x, paste0, collapse = " ")

Result is:

> texts
#                   sometext1     sometext2     common
#1       this is a text entry    text entry text entry
#2 here is another text entry    text entry text entry
#3             something else no match here

Creating new column with intersecting words from two other columns in R

1 Answers1

Linked