Fill missing values of dataframe based on value of matching other column with another dataframe

Question

I have two dataframes which look somewhat like this:

    dat1 <- data.frame(z = 1:20, val = seq(10, 105, by = 5))
    dat2 <- data.frame(z = seq(1, 20, by = 5), val = c(10, 20, NaN, NaN))

Which looks like:

dat1        dat2
 z val      z val
 5  30      0  15
 6  35      2  20
 7  40      8  40
 8  45      10 50  
 9  50      12 NaN 
10  55      15 NaN
11  60     
12  65
13  70
14  75
15  80
16  85
17  90
18  95
19 100
20 105

What I want is to fill in the NaN of the second dataframe with the same value in the val column where the z values are the same.

So I want my dat2 dataframe to look like:

I thought about getting the indexes of the NaN values, but they don't overlap. I need to match them somehow with the z column but fill in the NaN columns with those values. I don't really know how to do this.

I have tried the answers here, but all of these remove all values in dat2 below the minimum values of dat1, which I need to keep as well. Now the result looks like this:

dat2
z val
8 40
10 50
12 65
15 80

`transform(merge(dat1, dat2, by = 'z'), val = ifelse(is.na(val.y), val.x, val.y))[names(dat1)]` — Ronak Shah, Jul 26 '20 at 13:27
This does not work sadly, the dimension are not the same is the issue I'm getting — B.Quaink, Jul 27 '20 at 16:42

itsMeInMiami · Accepted Answer · 2020-07-28T11:13:33.850

This should work:

library(dplyr)
inner_join(dat2, dat1, by = "z") %>% 
  mutate(answer = coalesce(val.x, val.y))

inner joins try to match datasets based on a key (the by value). If it can find the records in both tables it keeps the record. The %>% is a "pipe" that says take this dataset and feed it to the next function. mutate is used to create a new variable in the tidyverse dialect of R. Coalesce returns the first not missing value. So, if val.x is not missing, it is used, if it is missing then val.y is used.

Edited:

Try this:

dat1 <- data.frame(z = 5:20, val = seq(30, 105, by = 5))
dat2 <- data.frame(z = c(0, 2, seq(1, 20, by = 5)), val = c(15, 20, 40, 50, NaN, NaN))

all_z <- tibble(z = sort(union(dat1$z, dat2$z)))

library(dplyr)

left_join(all_z, dat1, by = "z") %>% 
    left_join(dat2, by = "z") %>% 
    mutate(answer = coalesce(val.x, val.y)) %>% 
    select(z, answer)

To see how it works run the lines in order without the %>%. For example, try:

left_join(all_z, dat1, by = "z")

then try

left_join(all_z, dat1, by = "z") %>% 
    left_join(dat2, by = "z")

etc.

Hi, sadly this does not work completely. I edited my comment a bit, I have some values below the lowest value of the left dataframe which get filtered out after doing this. — B.Quaink, Jul 27 '20 at 16:32

Fill missing values of dataframe based on value of matching other column with another dataframe

1 Answers1