Merge two tibbles using an id and year

Question

I have two tibbles that look like this:

1.

a <- tribble(
  ~id_emca,       ~year,      ~id_estmun.x,   ~id_estmun.y, 
  "DLAGU20",       2006,          1005,          1005,    
  "DLAGU20",       2007,          1005,          1005,    
  "DLAGU20",       2008,          1005,          1005,    
  "DLBAJ155",      2012,          3003,          3001,    
  "DLBAJ155",      2013,          3003,          3001,  
  "DLBAJ157",      2017,          3007,          5010,    
  "DLBAJ157",      2018,          3007,          5010,
)

  id_emca   year id_estmun.x id_estmun.y
  <chr>    <dbl>       <dbl>       <dbl>
1 DLAGU20   2006        1005        1005
2 DLAGU20   2007        1005        1005
3 DLAGU20   2008        1005        1005
4 DLBAJ155  2012        3003        3001
5 DLBAJ155  2013        3003        3001
6 DLBAJ157  2017        3007        5010
7 DLBAJ157  2018        3007        5010

b <- tribble(
  ~start,       ~end,      ~id_estmun,   ~Party,     
    2005,        2007,          1005,      "PRI",
    2008,        2010,          1005,      "PAN",
    2010,        2012,          3001,      "PRI",
    2013,        2015,          3001,      "PT",
    2010,        2012,          3003,      "COAL",
    2013,        2015,          3003,      "PRD",
    2016,        2018,          3007,      "PAN",
    2019,        2021,          3007,      "COAL",
    2016,        2018,          5010,      "PRI",
) 

  start   end id_estmun Party
  <dbl> <dbl>     <dbl> <chr>
1  2005  2007      1005 PRI  
2  2008  2010      1005 PAN  
3  2010  2012      3001 PRI  
4  2013  2015      3001 PT   
5  2010  2012      3003 COAL 
6  2013  2015      3003 PRD  
7  2016  2018      3007 PAN  
8  2019  2021      3007 COAL 
9  2016  2018      5010 PRI

I am trying to create a new tibble in such a way that if the id_estmun.x and id_estmun.y variables are equal to the id_estmun on tibble 2 and the year variable is equal or in between the start and end variables, the variable Party should be merged to the right side of the first tibble.

Here is an example of what the new tibble should look like, this should make everything more clear:

id_emca   year id_estmun.x id_estmun.y Party.x Party.y
  <chr>    <dbl>       <dbl>       <dbl> <chr>   <chr>  
1 DLAGU20   2006        1005        1005 PRI     PRI    
2 DLAGU20   2007        1005        1005 PRI     PRI    
3 DLAGU20   2008        1005        1005 PAN     PAN    
4 DLBAJ155  2012        3003        3001 COAL    PRI    
5 DLBAJ155  2013        3003        3001 PRD     PT     
6 DLBAJ157  2017        3007        5010 PAN     PRI    
7 DLBAJ157  2018        3007        5010 PAN     PRI

This is what I have tried and it seems to work, but I was wondering if there was an easier way to do it:

c <- a %>% 
  left_join(b, 
            by = c("id_estmun.x"= "id_estmun"),
            keep = TRUE) %>% 
  filter(year >= start & year <= end)  %>% rename(Party.x = Party) %>% 
  dplyr::select(-one_of(c("start", "end", "id_estmun")))

d <- c %>% 
  left_join(b, 
            by = c("id_estmun.y"= "id_estmun"),
            keep = TRUE) %>% 
  filter(year >= start & year <= end)  %>% rename(Party.y = Party) %>% 
  dplyr::select(-one_of(c("start", "end", "id_estmun")))

What did you try so far? Did you try [this](https://stackoverflow.com/questions/21888910/how-to-specify-names-of-columns-for-x-and-y-when-joining-in-dplyr) and if so what didn't work? `by c("year"= "start", "id_estmun" = "id_estmun.x" ` should work shouldn't it? — Annet, May 10 '21 at 18:27
for matching on date ranges, I've found doing the joins with `sqldf` easiest. [look at this solution](https://stackoverflow.com/questions/23958975/join-tables-by-date-range) — Eric, May 10 '21 at 18:49
@Annet I added to the question what I have tried so far, it seems to work but I think there is a more efficient way to do it — Quinoba, May 10 '21 at 18:53
do you require that `'id_estmun.x' == 'id_estmun' & 'id_estmun.y' == 'id_estmun'` or `'id_estmun.x' == 'id_estmun' | 'id_estmun.y' == 'id_estmun'`. For example, in the former, the only time `id_estmun.x`, `id_estmun.y` and `id_estmun` are all equal is when it equals `1005`. — Justin Landis, May 10 '21 at 19:46

Annet · Answer 1 · 2021-05-10T20:36:23.000

Your code produces a warning. You try to select (or not to select? I dont know what that -one_of does and I didn't look it up as it isn't necessarily relevant for my answer) a variable that doesn't exist (id_estmun). Although I usually kind of ignore warnings, I wouldn't ignore one like this as it indicates you are not in control of what is happening in the selection process. So I think it is definitely better to use a different code. Simply changing that id_estmun to id_estmun.x or id_estmun.y also resolves that warning.

I think a shorter version of your code (with explanation in between) is this:

a %>% 
  # this puts the ids in 1 column rather than 2
  pivot_longer(cols = c(id_estmun.x, id_estmun.y), names_to = "names", values_to = "id_estmun") %>% 
  # add the data from table b
  left_join(b, by = "id_estmun") %>%
  # removes the years we do not want
  filter(year >= start & year <= end) %>% 
  # only keep the x and y for the naming
  mutate(names = gsub("[A-z]+\\_[A-z]+\\.","",names)) %>% 
  # we take the name from the column names we created 
  # and the values from both the id_estmun and Party column, which results in 4
  # new columns - two id columns and two party columns
  pivot_wider(names_from = names, values_from = c(id_estmun, Party))

The final result will look like:

 id_emca   year start   end id_estmun_x id_estmun_y Party_x Party_y
  <chr>    <dbl> <dbl> <dbl>       <dbl>       <dbl> <chr>   <chr>  
1 DLAGU20   2006  2005  2007        1005        1005 PRI     PRI    
2 DLAGU20   2007  2005  2007        1005        1005 PRI     PRI    
3 DLAGU20   2008  2008  2010        1005        1005 PAN     PAN    
4 DLBAJ155  2012  2010  2012        3003        3001 COAL    PRI    
5 DLBAJ155  2013  2013  2015        3003        3001 PRD     PT     
6 DLBAJ157  2017  2016  2018        3007        5010 PAN     PRI    
7 DLBAJ157  2018  2016  2018        3007        5010 PAN     PRI

Merge two tibbles using an id and year

1 Answers1