Use apply on a dataframe to fill in missing values from another dataframe

Question

First I want to say I am new to R. This problem is frustrating beyond belief. I have tried apply, lapply, and mapply. All with errors. I am lost.

What I want to do is take the time from "Results" and place it in the time in "Records" IF Records does not have a time (where it is NA).

I have already done this in a traditional for-loop but it makes the code hard to read. I have read the apply functions can make this easier.

Data Frame "Results"
ID Time(sec)    
1  1.7169811
2  1.9999999
3  2.3555445
4  3.4444444

Data Frame "Records"
ID Time(sec) Date
1  NA        1/1/2018
2  1.9999999 1/1/2018
3  NA        1/1/2018
4  3.1111111 1/1/2018

Data Frame 'New' Records
ID Time(sec)   Date
1  1.7169811 1/1/2018
2  1.9999999 1/1/2018
3  2.3555445 1/1/2018
4  3.1111111 1/1/2018

Try something like that: `records$time <- ifelse(is.na(records$time), results$time, records$time)` — Karolis Koncevičius, Mar 10 '18 at 19:06
Possible duplicate of [Conditional replacement of values in a data.frame](https://stackoverflow.com/questions/8214303/conditional-replacement-of-values-in-a-data-frame) — Karolis Koncevičius, Mar 10 '18 at 19:08
@KarolisKoncevičius How does the ifelse know that it is matching on the correct ID? — Shawn, Mar 10 '18 at 19:34
`ifelse` will not match the IDs. From the example it looked like they are matched. You will need to use `match()` to obtain the ID-matched `results` values. — Karolis Koncevičius, Mar 10 '18 at 19:37
try something like `ifelse(is.na(records$time), results$time[match(records$ID, results$ID)], records$time)` — Karolis Koncevičius, Mar 10 '18 at 19:44
@KarolisKoncevičius That makes sense. I was over complicating this using apply then. I can mark this as the answer if you post below. — Shawn, Mar 10 '18 at 19:52

score 3 · Accepted Answer · answered Mar 10 '18 at 19:59

No need to use apply in this situation. A pattern of conditionally choosing between two values based on some predicate is ifelse():

ifelse(predicate, value_a, value_b)

In this case you said you also have to make sure the values are matched by ID between the two dataframes. A function that achieves this in R is appropriately named match()

match(target_values, values_to_be_matched)

match returns indices that match values_to_be_matched to target_values when used like so: target_values[indices].

Combining this together:

inds <- match(records$ID, results$ID)
records$time <- ifelse(is.na(records$time), results$time[inds], records$time)

is.na() here is a predicate that checks if the value is NA for every value in the vector.

score 3 · Answer 2 · answered Mar 10 '18 at 20:00

Inspired by this answer.

From the help: Given a set of vectors, coalesce() finds the first non-missing value at each position. This is inspired by the SQL COALESCE function which does the same thing for NULLs

    library(tidyverse)

    txt1 <- "ID Time(sec)    
    1  1.7169811
    2  1.9999999
    3  2.3555445
    4  3.4444444"

    txt2 <- "ID Time(sec) Date
    1  NA        1/1/2018
    2  1.9999999 1/1/2018
    3  NA        1/1/2018
    4  3.1111111 1/1/2018"

    df1 <- read.table(text = txt1, header = TRUE)
    df2 <- read.table(text = txt2, header = TRUE)

    df1 %>% 
      left_join(df2, by = "ID") %>% 
      mutate(Time.sec. = coalesce(Time.sec..x, Time.sec..y)) %>% 
      select(-Time.sec..x, -Time.sec..y)

    #>   ID     Date Time.sec.
    #> 1  1 1/1/2018  1.716981
    #> 2  2 1/1/2018  2.000000
    #> 3  3 1/1/2018  2.355545
    #> 4  4 1/1/2018  3.444444

Created on 2018-03-10 by the reprex package (v0.2.0).

Use apply on a dataframe to fill in missing values from another dataframe

2 Answers2

Linked