Combining tibble dataframes by common values

Question

After reading the comments, particularly those about how groups would get merged, I realise that what I was asking did not make sense. This is the result that I actually want to achieve in my programme:

I have a tibble dataframe that looks as follows (although my actual dataframe is longer):

    Group   Person       
     <dbl>    <chr>     
1       1   Person 1.1 
2       2   Person 1.2 
3       2   Person 1.2 
4       3   Person 2.1 
5       4   Person 2.1 
6       4   Person 3.1 
7       5   Person 1.2 
8       5   Person 4.1 
9       6   Person 1.2
10      6   Person 4.2

I want the tibble to be split by Group. However, I have a group 2 that just has person 1.2 in it, but as person 1.2 is in group 5 with person 4.1 and in group 6 with person 4.2, I would like to delete group 2. Hence, if there is a group with only one type of person, and that person is in a group with another person, then they group where they are by themselves should be deleted.

Then the dataframe would look like this:

    Group   Person       
    <dbl>    <chr>     
1       1   Person 1.1 
4       3   Person 2.1 
5       4   Person 2.1 
6       4   Person 3.1 
7       5   Person 1.2 
8       5   Person 4.1 
9       6   Person 1.2
10      6   Person 4.2

Reproducible data for example dataframe above:

structure(list(Group = c(1, 2, 2, 3, 4, 4, 5, 5, 6, 6), Person = 
c("Person 1.1", 
"Person 1.2", "Person 1.2", "Person 2.1", "Person 2.1", "Person 3.1", 
"Person 1 .2", "Person 4.1", "Person 1.2", "Person 4.2")), spec = 
structure(list(
cols = list(Group = structure(list(), class = c("collector_double", 
"collector")), Person = structure(list(), class = 
c("collector_character", 
"collector"))), default = structure(list(), class = 
c("collector_guess", 
"collector")), skip = 1), class = "col_spec"), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

Hello :) In order for us to help you, please provide a [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example. For example, to produce a minimal data set, you can use `head()`, `subset()`. Then use `dput()` to give us something that can be put in R immediately. Alternatively, you can use base R datasets such as `mtcars`, `iris`, *etc*. — Paul, Aug 18 '20 at 13:49
What happens if, e.g., Group 1 share a person with Group 2, and Group 2 shares a different person with Group 3. Should they all become one big group? — Gregor Thomas, Aug 18 '20 at 13:56
No, say Group 1 contains person 1, Group 2 contains person 1 and person 2 and Group 3 contains person 2 and person 3. I would want person 1 and person 2 to be grouped together and person 2 and person 3 to be grouped together — Alan20, Aug 18 '20 at 14:01
Thx for data, there is a small typo (so small that I can't edit...), `5 shape 4 Person 1 .2` sould be `5 shape 4 Person 1.2` — Paul, Aug 18 '20 at 14:01
This makes me think about something such as [Venn diagram](https://www.r-graph-gallery.com/venn-diagram.html). — Paul, Aug 18 '20 at 14:47
I'm still a bit confused. How is this different from grouping by person? If Group 1 has persons 1.1 and 1.2, and Group 2 has persons 1.1 (shared with Group 1), 2.1, and 3.1, and Group 3 has person 3.1, what is the result? Does person 2.1 get merged with the people in Group 1 or Group 3? — Gregor Thomas, Aug 18 '20 at 14:58

score 1 · Accepted Answer · answered Aug 18 '20 at 15:37

Based on your edits, I would code this by first finding the persons that appear in groups with others (call it persons_with_others), and then filtering out size-1 groups where the person in that group is one of the persons_with_others.

library(dplyr)
persons_with_others = df %>%
  group_by(Group) %>%
  filter(n_distinct(Person) > 1) %>%
  pull(Person) %>% 
  unique

df %>% 
  group_by(Group) %>%
  filter(!(n_distinct(Person) == 1 & Person %in% persons_with_others))
# # A tibble: 7 x 2
# # Groups:   Group [4]
#   Group Person     
#   <dbl> <chr>      
# 1     1 Person 1.1 
# 2     4 Person 2.1 
# 3     4 Person 3.1 
# 4     5 Person 1 .2
# 5     5 Person 4.1 
# 6     6 Person 1.2 
# 7     6 Person 4.2

This result is different than your desired output, but I think it is correct: Group 3 is eliminated because it only contains Person 2.1, and Person 2.1 appears in Group 4 with another person (Person 3.1).

ThomasIsCoding · Answer 2 · 2020-08-18T15:11:40.303

Here is a base R option

dfs <- split(df,df$Group)
res <- list()
while(length(dfs)>0) {
  S <- dfs[[1]]$Person
  inds <- 1
  for (k in seq_along(dfs)[-1]) {
    if (length(intersect(dfs[[k]]$Person,S)) >0) {
      S <- union(S,dfs[[k]]$Person)
      inds <- c(inds,k)
    }
  }
  res[[length(res)+1]] <- do.call(rbind,dfs[inds])
  dfs <- dfs[-inds]
}

which gives

> res
[[1]]
# A tibble: 1 x 3
  Group Shape   Person
* <dbl> <chr>   <chr>
1     1 shape 1 Person 1.1

[[2]]
# A tibble: 4 x 3
  Group Shape   Person
* <dbl> <chr>   <chr>
1     2 shape 5 Person 1.2
2     2 shape 2 Person 1.2
3     5 shape 4 Person 1.2
4     5 shape 1 Person 4.1

[[3]]
# A tibble: 3 x 3
  Group Shape   Person
* <dbl> <chr>   <chr>
1     3 shape 3 Person 2.1
2     4 shape 3 Person 2.1
3     4 shape 6 Person 3.1

Combining tibble dataframes by common values

2 Answers2