-1

I would like to know how to append new data (rows) from one data frame, df2, to an existing data frame, df1, based on a unique value in each table. So I have an existing data frame, df1, that has historical data and each row has a unique value. I then pull data from the web and put it into a new data frame, df2. The new data frame also includes a unique value which may or may not match a unique value in df1.

I would like to take all rows in df2 that have a unique value that does not exist in df1, and append those rows to df1. My initial thoughts were to use code similar to this:

ifelse(any(df1$unique_val==df2$unique_val), df1 <- df1, df1 <- rbind(df2, df1))

But then I realized that I need a more one-to-one match than an "any" match. I know how I would do this in SQL with a UNION and WHERE clause, but I'm not sure how to make it work in R. The only related items I could find researching were appending all data from two data frames or appending a new column to an existing data frame.

The following example shows what I am looking for and why I am not looking to "join" these two data frames"

df1 = data.frame(numb = c(1:6), rand = c(rep("Toaster",6)))

df1$unique_val <- paste0(df1$numb, df1$rand)

> df1 numb rand unique_val 1 1 Toaster 1Toaster 2 2 Toaster 2Toaster 3 3 Toaster 3Toaster 4 4 Toaster 4Toaster 5 5 Toaster 5Toaster 6 6 Toaster 6Toaster

df2 = data.frame(numb = c(5:7), rand = c(rep("Toaster",2), c(rep("Radio",1))))

df2$unique_val <- paste0(df2$numb, df2$rand)

> df2 numb rand unique_val 1 5 Toaster 5Toaster 2 6 Toaster 6Toaster 3 7 Radio 7Radio

As you can see, row 3 in df2 is the only new row (a row that does not have a matching unique_val in df1). I would like to add this new row to df1. Note: it's not always the same row that is new in df2.

I used each of the joins from this post, merge/join data frames as follows:

merge(df1,df2, by = "unique_val")

merge(df1,df2, by = "unique_val", all = TRUE)

merge(df1,df2, by = "unique_val", all.x = TRUE)

merge(df1,df2, by = "unique_val", all.y = TRUE)

I also tried the anti_join from dplyr:

anti_join(df1,df2, by = "unique_val")

Rbind gives me the following:

rbind(df1,df2) numb rand conc 1 1 Toaster 1Toaster 2 2 Toaster 2Toaster 3 3 Toaster 3Toaster 4 4 Toaster 4Toaster 5 5 Toaster 5Toaster 6 6 Toaster 6Toaster 7 5 Toaster 5Toaster 8 6 Toaster 6Toaster 9 7 Radio 7Radio

None of which give me the desired output of the following:

numb rand conc 1 1 Toaster 1Toaster 2 2 Toaster 2Toaster 3 3 Toaster 3Toaster 4 4 Toaster 4Toaster 5 5 Toaster 5Toaster 6 6 Toaster 6Toaster 7 7 Radio 7Radio

I'm looking to rbind these data frames, not join them.

Community
  • 1
  • 1
d84_n1nj4
  • 1,712
  • 6
  • 23
  • 40
  • What about using unique(rbind(df1,df2))??? – Lourdes Hernández Sep 12 '16 at 14:54
  • 1
    Hi @Cath, @akrun solved my solution. I didn't ask the same question again--I provided much more detail as this website explains to do if you don't think it is a duplicate question, which I still do not believe it to be a duplicate. Fully read my question and use `merge(df1, df2, all.x=TRUE)`. It does NOT provide the solution I want to see as explained in my question. – d84_n1nj4 Sep 12 '16 at 15:15
  • @d84_n1nj4 Just do a full join: `merge(df1,df2,by=c('numb','rand'),all.x=TRUE,all.y=TRUE)` – Tensibai Sep 12 '16 at 15:21
  • 1
    corecting: **outer** join, and the syntax can be simplified: `merge(df1,df2,by=c('numb','rand'),all=TRUE)` – Tensibai Sep 12 '16 at 15:22

1 Answers1

-1

We can use rbindlist/unique from data.table. We place the datasets in a list, use rbindlist (from data.table) to rbind the datasets in the list to a single data.table and get the unique rows with unique from data.table which also has the by option to specify the variable.

library(data.table)
unique(rbindlist(list(df1, df2)), by = "numb")
#   numb    rand unique_val
#1:    1 Toaster   1Toaster
#2:    2 Toaster   2Toaster
#3:    3 Toaster   3Toaster
#4:    4 Toaster   4Toaster
#5:    5 Toaster   5Toaster
#6:    6 Toaster   6Toaster
#7:    7   Radio     7Radio
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 4
    Not my vote (and I won't vote here) but I think it could be related to [this meta post](http://meta.stackoverflow.com/a/255462/3627607). In my point of view your answer, albeit correct, is just dropping a bunch of function without any explanation on how it works/solve the problem. It make it not really useful for future readers as they'll have a hard time figuring what does what, so it's not useful **in my opinion** (and I assume in someone else opinion too). – Tensibai Sep 12 '16 at 15:13
  • 1
    You asked for an explanation, I just gave my point of view about it. I just added the precision because I can't speak for someone else. – Tensibai Sep 12 '16 at 15:19
  • 3
    The OP is way too long for me to read, but this may be relevant (since the same answer seems to fit): http://stackoverflow.com/q/39398135/ (Almost certainly not a dupe since the OP there wanted package-specific answers.) – Frank Sep 12 '16 at 15:23
  • @Frank That looks like a dupe. I will tag it – akrun Sep 12 '16 at 15:31