0

I have two datasets.

Dataset 1 looks like the following:

dat1 <- read.table(header=TRUE, text="
ID  Per  Gu  Ta
1123    112301  14  13
                   1124 112401  14  19
                   1125 112501 29  25
                   1126 112601  22  20
                   ")

dat1

    ID    Per Gu Ta
1 1123 112301 14 13
2 1124 112401 14 19
3 1125 112501 29 25
4 1126 112601 22 20

Dataset 2 looks like the following:

dat2 <- read.table(header=TRUE, text="
ID  Veh  Pert  Ti  ID1
1123    1 1 100 11231
                   1123 2 1 110 11232
                   1124 1 1 107 11241
                   1124 2 1 111 11242
                   1124 3 2 109 11243
                   1125 2 2 118 11251
                   1125 3 3 113 11252
                   1125 4 1 108 11253
                   1126 3 4 119 11265
                   1126 3 1 112 11268
                   ")
dat2
     ID Veh Pert  Ti   ID1
1  1123   1    1 100 11231
2  1123   2    1 110 11232
3  1124   1    1 107 11241
4  1124   2    1 111 11242
5  1124   3    2 109 11243
6  1125   2    2 118 11251
7  1125   3    2 113 11252
8  1125   4    1 108 11253
9  1126   3    4 119 11265
10 1126   3    1 112 11268

dat1 is needed to be left joined with dat2 by ID with rows having the minimum of at first Veh and then Pert of dat2. The final data will be like the following:

ID   Per    Gu Ta Veh  Pert  Ti  ID1
1123 112301 14 13 1     1    100 11231
1124 112401 14 19 1     1    107 11241
1125 112501 29 25 2     2    118 11251   ### in `row 8` of `dat2` the min value of `Per` is `1` but `Veh` is `4`;
1126 112601 22 20 3     1    112 11268
user438383
  • 5,716
  • 8
  • 28
  • 43
S Das
  • 3,291
  • 6
  • 26
  • 41
  • Does this answer your question? [How to do a conditional join in R with dplyr?](https://stackoverflow.com/questions/52648580/how-to-do-a-conditional-join-in-r-with-dplyr) – Mark Jun 25 '23 at 23:19

3 Answers3

1

I've done it this way:

  1. left_join then arrange based on Veh and Pert
  2. group_by ID and picking first values
library(dplyr)

dat1 <- read.table(
  header = TRUE,
  text = "
ID  Per  Gu  Ta
1123    112301  14  13
                   1124 112401  14  19
                   1125 112501 29  25
                   1126 112601  22  20
                   "
)

dat2 <- read.table(
  header = TRUE,
  text = "
ID  Veh  Pert  Ti  ID1
1123    1 1 100 11231
                   1123 2 1 110 11232
                   1124 1 1 107 11241
                   1124 2 1 111 11242
                   1124 3 2 109 11243
                   1125 2 2 118 11251
                   1125 3 3 113 11252
                   1125 4 1 108 11253
                   1126 3 4 119 11265
                   1126 3 1 112 11268
                   "
)

# left join or order by Veh then Pert
left_joined_df <-
  dat1 %>% left_join(dat2, by = c("ID" = "ID")) %>% arrange(Veh, Pert)


# group by ID and picking first values
result_df <-
  left_joined_df %>% group_by(ID) %>% summarize(
    .groups = 'keep',
    Per = first(Per),
    Gu = first(Gu),
    Ta = first(Ta),
    Veh = first(Veh),
    Pert = first(Pert),
    Ti = first(Ti),
    ID1 = first(ID1)
  ) %>% ungroup()
Ehsan Hamzei
  • 339
  • 2
  • 8
1
dat1 |>
  left_join(dat2 |>
              arrange(Veh, Pert) |>
              slice(1, .by = ID))

Joining with `by = join_by(ID)`
    ID    Per Gu Ta Veh Pert  Ti   ID1
1 1123 112301 14 13   1    1 100 11231
2 1124 112401 14 19   1    1 107 11241
3 1125 112501 29 25   2    2 118 11251
4 1126 112601 22 20   3    1 112 11268
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
1

Easily done.

library(tidyverse)

dat_join <- left_join(dat1, dat2, by = "ID") %>%
  arrange(Veh) %>%
  arrange(Pert)

dat_join
U Bhalraam
  • 96
  • 7