1

I have a dataframe with two persons in three points of time (3x "id" == 1 and 3x "id" == 2):

id <- c(1, 1, 1, 2, 2, 2)
id2 <- c(NA, NA, NA, 1, 1, 1)
x <- c(4, 5, 5, 1, 1, 1)
dat1 <- data.frame(id, id2, x)
dat1

  id id2 x
1  1  NA 4
2  1  NA 5
3  1  NA 5
4  2   1 1
5  2   1 1
6  2   1 1

Now i want to create a new variable "y" with following rule: If "id2" is not NA, "y" should be the value of "x" that occurs most often for the person with "id2" == "id". In this example data: For all points in time, the person with "id" == 2 gets a 5 in "y", because person 2 has a 1 in "id2" and 5 is the number that occurs most often for the person with "id" == 1. Since "id2" is NA for person 1, "y" will be NA aswell (there is no other person to refer to for person 1). Result is:

  id id2 x y
1  1  NA 4 NA
2  1  NA 5 NA
3  1  NA 5 NA
4  2   1 1 5
5  2   1 1 5
6  2   1 1 5

Is there a way to do this with dplyr?

C.F.
  • 294
  • 1
  • 11

1 Answers1

1

We may find the Mode grouped by 'id', then match the 'id2' with 'id' and replace with the 'Mode' values

library(dplyr)
dat1 %>% 
    group_by(id) %>%
    mutate(tmp = Mode(x)) %>% 
    ungroup %>%
    mutate(y= tmp[match(id2, id)], tmp = NULL)

where

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
akrun
  • 874,273
  • 37
  • 540
  • 662
  • A follow-up question: How do i set the value for "y" always to the same value as "x" for the matched person? That means in this example data: Person with "id" = 1, which is matched with person 2 via "id2", gets y: c(4, 5, 5) (since there is no "id2" or match for person with "id" = 2, y will still be NA for person 2). This rule is less complex and could be useful in further work... – C.F. Nov 17 '21 at 20:03
  • @C.F. Maybe you need `match(id2, unique(id))` ? Can you please post as a new question to underestand it better. thanks – akrun Nov 17 '21 at 20:05
  • https://stackoverflow.com/questions/70017603/new-variables-using-ids-linking-different-cases-in-longidutinal-data – C.F. Nov 18 '21 at 09:46