2

Suppose I have the following data frame

library(tidyverse)

df <- tibble(event_id = c(1, 1, 2, 3, 3, 3),
             person_id = c(1, 2, 3, 1, 4, 5))

event_id   person_id
1          1
1          2
2          3
3          1
3          4
3          5

What I would like is to end up with a data frame that looks like this...

event_id     person_1    person_2
1            1           2
2            3           3
3            1           4
3            1           5
3            4           5

How can I do this? Preferably using the tidyverse if possible.

drizzle123
  • 517
  • 5
  • 18

3 Answers3

3

Here is an approach that uses tidyverse, but I'm not sure I'm following the rules of your intended edgelist..

  1. Create helper Function
f <- function(p) {
  sn <- function(x) setNames(x,c("person_1", "person_2"))
  if(length(p)<2) return(list(sn(c(p,p))))
  combn(p,2,sn,simplify=F)
}
  1. Apply f() to each event_id and unnest_wider()
df %>% group_by(event_id) %>%
  summarize(k = f(person_id)) %>% 
  unnest_wider(k)

Output:

  event_id person_1 person_2
     <dbl>    <dbl>    <dbl>
1        1        1        2
2        2        3        3
3        3        1        4
4        3        1        5
5        3        4        5
langtang
  • 22,248
  • 1
  • 12
  • 27
0

You can try the following base R code using by + combn

with(
  df,
  do.call(
    rbind,
    Map(cbind,
      event_id = levels(factor(event_id)), 
      by(person_id, event_id, function(x) as.data.frame(t(combn(x, 2))))
    )
  )
)

which gives

    event_id V1 V2
1          1  1  2
2.1        2  1  2
2.2        2  1  3
2.3        2  2  3
3.1        3  1  4
3.2        3  1  5
3.3        3  4  5
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0
df <- data.frame(event_id = c(1, 1, 2, 3, 3, 3),
                 person_id = c(1, 2, 3, 1, 4, 5)
                )

## see also:
## https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right
## Restricting ourselves to base R.

## Cross dataframe with self.
merge(x = df, y = df, by = "event_id", all = TRUE) |>
subset(person_id.x <= person_id.y)                 -> cc1
colnames(cc1)[2:3] <- c("person_1", "person_2")
cc1

## frequency by event_id.
evfreq <- data.frame(table(event_id=cc1$event_id))

## delete identical persons, but save singletons, e.g. event 2.
merge(x = cc1, y = evfreq, by = "event_id", all = TRUE) |>
subset(person_1 != person_2 | Freq == 1, select=-Freq)  -> cc2
rownames(cc2) <- c()
cc2

This gives the output as requested, using base R.

clp
  • 1,098
  • 5
  • 11