It sounds like you want to ensure that there is only one z
value for each x
and y
value. The main question is how you choose which z value to associate with it. From the description, I'm guessing you either want the second data frame to always override, or you want the maximum value to be taken.
Start with the raw data:
df1 <- structure(list(x = c(10L, 10L, 11L, 11L, 12L, 12L), y = c(10L, 12L, 10L, 12L, 10L, 12L), z = c(7L, 6L, 8L, 2L, 1L, 5L)), .Names = c("x", "y", "z"), class = "data.frame", row.names = c(NA, -6L))
df2 <- structure(list(x = 10:12, y = c(10L, 10L, 12L), z = c(100L, 200L,400L)), .Names = c("x", "y", "z"), class = "data.frame", row.names = c(NA,-3L))
If it's the max you want, then you probably want to simply combine the two frames, and then extract the max for each x
and y
:
merged.df <- aggregate(z ~ x + y, data = rbind(df1, df2), max)
If, instead, you want the second data frame to override the first, then you would aggregate using the last value to match
merged.df <- aggregate(z ~ x+ y, data=rbind(df1, df2), function(d) tail(d, n=1))
If you have many columns besides z
, then I can only assume that you want the latter behavior. For this, you're better off using a library like data.table
or dplyr
. In dplyr
, it would look like this
require(dplyr)
merged.df <- rbind(df1, df2) %>% group_by(x, y) %>% summarise_each(funs(last))
With data.table
it would look like
require(data.table)
merged.df <- setDT(rbind(df1, df2))[, lapply(.SD, last), .(x,y)]