Replace dataframe one based on ID from dataframe two

Question

I'd like to replace the first df in the example below with rows from dataframe two based on ID column. For example: suppose person X have 100 items on dataframe 1, but when we look at dataframe two, we see that he actually only have 50 items and other 50 are for person Z , so in the final result , we should have a row for person X with 50 item and another row for person Z with 50 item , both have same ID.

Dataframe 1

ID      Name        Status  Items
16      Amy B       Closed  100
10      Erik C      Closed  80
14      Paul R      Closed  20
17      Chris K     Closed  40
19      Ali I        Closed   60
22      Jenny A     Closed  40

Dataframe 2

ID  Name    Items
14  Paul R  10
14  Sarah K 10
22  Jenny A 30
22  Brian L 10

results

ID  Name    Status  Items
16  Amy B   Closed  100
10  Erik C  Closed  80
14  Paul R  Closed  10
14  Sarah K Closed  10
17  Chris K Closed  40
19  Ali I   Closed  60
22  Jenny A Closed  30
22  Brian L Closed  10

Required reading: [*How to join (merge) data frames (inner, outer, left, right)?*](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) — Jaap, Feb 15 '18 at 08:33
So you want to a) check intersection of IDs in df1 and df2, b) remove those IDs from df1, and c) rbind those IDs from df2 to df1? Have you tried anything so far that didn't work? — talat, Feb 15 '18 at 08:36
I have used left join: r <- merge(x = first_dataframe, y = second_dataframe, by = "ID", all.x = TRUE) but that creatd tao additional column Name.y and Items.y additional to the Name.x and Items.x so is there a way to move the rows from from Name.y and Items.y to Name.x and Items.x — user5879741, Feb 15 '18 at 08:50

Brandon · Accepted Answer · 2018-02-15T09:19:39.087

It looks like you're doing some merges here, and giving priority to the values for "Items" that are in data frame 2.

Try the code below which uses dplyr package and left_join() and full_join().

Loading the Data...

df1 <- read.table(header=TRUE, stringsAsFactors = FALSE, text=
'ID      Name        Status  Items
16      Amy_B       Closed  100
10      Erik_C      Closed  80
14      Paul_R      Closed  20
17      Chris_K     Closed  40
19      Ali_I        Closed   60
22      Jenny_A     Closed  40')


df2 <- read.table(header = TRUE, stringsAsFactors = FALSE, text =
"ID  Name    Items
14  Paul_R  10
14  Sarah_K 10
22  Jenny_A 30
22  Brian_L 10")

Merging the tables

# add the status column to df2
df <- left_join(df2, df1 %>% select(ID, Status), by = 'ID')
# ID    Name Items Status
# 14  Paul_R    10 Closed
# 14 Sarah_K    10 Closed
# 22 Jenny_A    30 Closed
# 22 Brian_L    10 Closed

# combine both data frames by merging for both ID and Name
df <- full_join(df, df1, 
                by = c('ID', 'Name', 'Status'),
                suffix = c('.1', '.2'))
# ID    Name Items.1 Status Items.2
# 14  Paul_R      10 Closed      20
# 14 Sarah_K      10 Closed      NA
# 22 Jenny_A      30 Closed      40
# 22 Brian_L      10 Closed      NA
# 16   Amy_B      NA Closed     100
# 10  Erik_C      NA Closed      80
# 17 Chris_K      NA Closed      40
# 19   Ali_I      NA Closed      60

# create a new column which selects the df2 value if that exists, otherwise uses df1 value
df <- df %>% 
    mutate(Items = ifelse(is.na(Items.1), Items.2, Items.1)) %>% 
    select(-Items.1, -Items.2)
# ID    Name Status Items
# 14  Paul_R Closed    10
# 14 Sarah_K Closed    10
# 22 Jenny_A Closed    30
# 22 Brian_L Closed    10
# 16   Amy_B Closed   100
# 10  Erik_C Closed    80
# 17 Chris_K Closed    40
# 19   Ali_I Closed    60

Putting it all together...

left_join(df2, df1 %>% select(ID, Status), by = 'ID') %>%
full_join(df1,
          by = c('ID', 'Name', 'Status'), 
          suffix = c('.1', '.2')) %>% 
    mutate(Items = ifelse(is.na(Items.1), Items.2, Items.)) %>% 
    select(-Items.1, -Items.2)

Gives the following table as output:

ID    Name Status Items
14  Paul_R Closed    10
14 Sarah_K Closed    10
22 Jenny_A Closed    30
22 Brian_L Closed    10
16   Amy_B Closed   100
10  Erik_C Closed    80
17 Chris_K Closed    40
19   Ali_I Closed    60

@user5879741 Great! Please note that I had a typo and just edited the code to fix it, so make sure that the current version runs for you as expected. — Brandon, Feb 15 '18 at 09:20

moodymudskipper · Answer 2 · 2018-02-15T09:05:14.340

Assuming your real data is as regular as your sample data, you have redundant information, the important information is:

the amount of unsplit items by id in df1
the amount of split item in df2
the status, linked to id in df3

So what we do is first we add the Status info to df2 (merge(df2,df1[c(1,3)])), then we rbind the relevant item info from df1 and df2.

rbind(df1[!df1$ID%in% df2$ID,],merge(df2,df1[c(1,3)]))

#    ID    Name Status Items
# 1  16 Amy B   Closed   100
# 2  10 Erik C  Closed    80
# 4  17 Chris K Closed    40
# 5  19 Ali I   Closed    60
# 11 14 Paul R  Closed    10
# 21 14 Sarah K Closed    10
# 3  22 Jenny A Closed    30
# 41 22 Brian L Closed    10

data

df1 <- read.table(text="ID      Name        Status  Items
16      'Amy B  '     Closed  100
10      'Erik C '     Closed  80
14      'Paul R '     Closed  20
17      'Chris K'     Closed  40
19      'Ali I  '      Closed   60
22      'Jenny A'     Closed  40",h=T,strin=F)

df2<- read.table(text="ID  Name    Items
14  'Paul R ' 10
14  'Sarah K' 10
22  'Jenny A' 30
22  'Brian L' 10",h=T,strin=F)

Replace dataframe one based on ID from dataframe two

2 Answers2