It looks like you're doing some merges here, and giving priority to the values for "Items" that are in data frame 2.
Try the code below which uses dplyr
package and left_join()
and full_join()
.
Loading the Data...
df1 <- read.table(header=TRUE, stringsAsFactors = FALSE, text=
'ID Name Status Items
16 Amy_B Closed 100
10 Erik_C Closed 80
14 Paul_R Closed 20
17 Chris_K Closed 40
19 Ali_I Closed 60
22 Jenny_A Closed 40')
df2 <- read.table(header = TRUE, stringsAsFactors = FALSE, text =
"ID Name Items
14 Paul_R 10
14 Sarah_K 10
22 Jenny_A 30
22 Brian_L 10")
Merging the tables
# add the status column to df2
df <- left_join(df2, df1 %>% select(ID, Status), by = 'ID')
# ID Name Items Status
# 14 Paul_R 10 Closed
# 14 Sarah_K 10 Closed
# 22 Jenny_A 30 Closed
# 22 Brian_L 10 Closed
# combine both data frames by merging for both ID and Name
df <- full_join(df, df1,
by = c('ID', 'Name', 'Status'),
suffix = c('.1', '.2'))
# ID Name Items.1 Status Items.2
# 14 Paul_R 10 Closed 20
# 14 Sarah_K 10 Closed NA
# 22 Jenny_A 30 Closed 40
# 22 Brian_L 10 Closed NA
# 16 Amy_B NA Closed 100
# 10 Erik_C NA Closed 80
# 17 Chris_K NA Closed 40
# 19 Ali_I NA Closed 60
# create a new column which selects the df2 value if that exists, otherwise uses df1 value
df <- df %>%
mutate(Items = ifelse(is.na(Items.1), Items.2, Items.1)) %>%
select(-Items.1, -Items.2)
# ID Name Status Items
# 14 Paul_R Closed 10
# 14 Sarah_K Closed 10
# 22 Jenny_A Closed 30
# 22 Brian_L Closed 10
# 16 Amy_B Closed 100
# 10 Erik_C Closed 80
# 17 Chris_K Closed 40
# 19 Ali_I Closed 60
Putting it all together...
left_join(df2, df1 %>% select(ID, Status), by = 'ID') %>%
full_join(df1,
by = c('ID', 'Name', 'Status'),
suffix = c('.1', '.2')) %>%
mutate(Items = ifelse(is.na(Items.1), Items.2, Items.)) %>%
select(-Items.1, -Items.2)
Gives the following table as output:
ID Name Status Items
14 Paul_R Closed 10
14 Sarah_K Closed 10
22 Jenny_A Closed 30
22 Brian_L Closed 10
16 Amy_B Closed 100
10 Erik_C Closed 80
17 Chris_K Closed 40
19 Ali_I Closed 60