3

I'm currently working on incorporating elements of one data table with another "master" set, by some reference column. To make things clearer, I have created some sample data:

This is the dataset I am looking to join onto another "master set".

data.frame(refID = c(1,3,4,5,7,8), value = c(3.3,3.9,4.4,8.0,1.1,2.5))

refID   value
1       3.3
3       3.9
4       4.4
5       8.0
7       1.1
8       2.5

The master set:

data.frame(refID = 1:9, value = rep(0,9))

refID   value
1       0
2       0
3       0
4       0
5       0
6       0
7       0
8       0
9       0

I am basically looking to send the column of values from the first data set into the second, but where there are gaps, let them have a value of 0. Ultimately, I am hoping to get:

Resulting Set:

refID   value
1       3.3
2       0.0   
3       3.9
4       4.4
5       8.0
6       0.0
7       1.1
8       2.5
9       0.0

I've played around with some stuff in the dplyr and data.table packages but can't seem to really pinpoint a good and direct way of doing it. Advice would be greatly appreciated, many thanks.

Arun
  • 116,683
  • 26
  • 284
  • 387
user1398057
  • 1,109
  • 1
  • 11
  • 15

2 Answers2

6

Using data.table, you can replace values from the first data.table on to the second by reference as follows:

require(data.table)

# data
DT1 = data.table(refID = c(1,3,4,5,7,8), value = c(3.3,3.9,4.4,8.0,1.1,2.5))
DT2 = data.table(refID = 1:9, value = 0)

setkey(DT2, refID)
DT2[DT1, value := i.value]
#    refID value
# 1:     1   3.3
# 2:     2   0.0
# 3:     3   3.9
# 4:     4   4.4
# 5:     5   8.0
# 6:     6   0.0
# 7:     7   1.1
# 8:     8   2.5
# 9:     9   0.0

Please refer to this post for explanation.

Community
  • 1
  • 1
Arun
  • 116,683
  • 26
  • 284
  • 387
0

The way I can think of would be to temporarily name the values val1 and val2, do a full_join so non-matching values are NA, then mutate a value column using the presence of the NAs. i.e. something like:

df_1 <- data.frame(refID = c(1,3,4,5,7,8), v1 = c(3.3,3.9,4.4,8.0,1.1,2.5))
df_2 <- data.frame(refID = 1:9, v2 = 0)

df_merged <- df_1 %>% 
    full_join(df_2) %>%
    mutate(value=ifelse(!is.na(v1), v1, v2) %>% # or just 0 in this case 
    select(refID, value)
JonMinton
  • 1,239
  • 2
  • 8
  • 26