R: match two data frames and create new column in one

Question

I have two data frames.
One is named "Friends" with the person's id and his/her friends, while the other one is named "Likes" describing the person's actions as shown below.
I want to create a new column in the second data frame to indicate whether the person likes his/her friend's post or not.
How can I do it in R? Thank You.

Friends

     id    friend's name
     1     A
     1     B
     1     C
     2     B
     2     E
     2     H
     3     A
     3     F

Likes

     id     title
     1      1 likes A's post.
     1      1 likes F's post.
     2      2 likes G's post.
     3      3 likes F's post.
     3      3 likes B's post.

so I want to create a new column named "likedfriendspost", and the data frame to be like this:

    id     title                  likefriendspost
     1      1 likes A's post.         1
     1      1 likes his own post.     0
     2      2 likes G's post.         0
     3      3 likes F's post.         1
     3      3 likes B's post.         0

Read about [merge](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) — zx8754, Oct 08 '20 at 11:19
You may have one more table where Details of users are there, so that `1=F` — AnilGoyal, Oct 08 '20 at 14:54

Ben · Accepted Answer · 2020-10-09T12:07:28.827

As mentioned in the comments, merging your two data frames together will be helpful here.

This is one approach using the tidyverse packages.

First, extract the friend's name from title using an appropriate regex pattern.

Then, merge with left_join the two data frames, and for each id and title, check to see if the name is included in friend_name (from the friends data frame).

library(tidyverse)

likes %>%
  mutate(name = sub(".*likes (.+?)('s)? post.", "\\1", title)) %>%
  left_join(friends) %>%
  group_by(id, title, name) %>%
  summarise(likefriendpost = +any(name %in% friend_name))

Edit: Here is a data.table version.

First, initialize likefriendpost to zero in likes table, and extract friend_name in a similar fashion. Then, use setkey for sorting and preparing to join on both id and friend_name. Finally, change likefriendpost to 1 where joined on id and friend_name.

library(data.table)

setDT(likes)
setDT(friends)

likes[, `:=`(likefriendpost = 0, friend_name = sub(".*likes (.+?)('s)? post.", "\\1", title))]

setkey(likes, id, friend_name)
setkey(friends, id, friend_name)

likes[friends, on = .(id, friend_name), likefriendpost := 1]

Output

     id title                 name    likefriendpost
  <dbl> <chr>                 <chr>            <int>
1     1 1 likes A's post.     A                    1
2     1 1 likes his own post. his own              0
3     2 2 likes G's post.     G                    0
4     3 3 likes B's post.     B                    0
5     3 3 likes F's post.     F                    1

Data

likes <- structure(list(id = c(1, 1, 2, 3, 3), title = c("1 likes A's post.", 
"1 likes his own post.", "2 likes G's post.", "3 likes F's post.", 
"3 likes B's post.")), class = "data.frame", row.names = c(NA, 
-5L))

friends <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), friend_name = c("A", 
"B", "C", "B", "E", "H", "A", "F")), class = "data.frame", row.names = c(NA, 
-8L))

Thank you. However, if the data frames are large, how to use data.table instead of tidyverse? — CS Tsai, Oct 09 '20 at 09:35
@SRT. See edited answer for `data.table` approach. Hope this is helpful. — Ben, Oct 09 '20 at 12:07

R: match two data frames and create new column in one

1 Answers1