Update: My apologies... I should have specified that there are missing values for some tests (come conducted n times, others conducted m times), and I'm thinking that's going to affect the answer based on the merge
documentation:
all.x
logical; if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y. These rows will have NAs in those columns that are usually filled with values from y. The default is FALSE, so that only rows with data from both x and y are included in the output.
I don't want either of these options: namely, I don't want only matching row/column values kept, nor do I want non-matching row/column intersection values filled in with NAs.
I have data from an experiment and stored my chemical compositions in one file and testing in another. I'd like to merge these so I can plot boxplots faceted by composition factor level, but am having a hard time figuring out how to do it. Here's some example data (adjusted per the above):
set.seed(1)
df.ids <- data.frame(lot=c("A","B","C"),
var1=c(1,2,3),
var2=c(4,5,6))
df.testing <- data.frame(lot=c(rep("A",4), rep("B",4),rep("C",4)),
test1=rep(c(runif(3,min=1,max=5),NA),3),
test2=rep(c(runif(2,min=1,max=5),NA,NA),3),
test3=rep(runif(4,min=1,max=5),3))
Like my real data, the ids data frame has one row per factor level, but the test data has multiple for each test replicate. How might I merge these so that I fill in the compositions for each of the lots? In other words, my resultant data frame would look like this:
| lot | test1 | test2 | test3 | var1 | var2 |
|-----+----------+----------+----------+------+------|
| A | 2.062035 | 4.632831 | 4.593559 | 1 | 4 |
| A | 2.488496 | 1.806728 | 4.778701 | 1 | 4 |
| A | 3.291413 | NA | 3.643191 | 1 | 4 |
| A | NA | NA | 3.516456 | 1 | 4 |
| B | 2.062035 | 4.632831 | 4.593559 | 2 | 5 |
| B | 2.488496 | 1.806728 | 4.778701 | 2 | 5 |
| B | 3.291413 | NA | 3.643191 | 2 | 5 |
| B | NA | NA | 3.516456 | 2 | 5 |
| C | 2.062035 | 4.632831 | 4.593559 | 3 | 6 |
| C | 2.488496 | 1.806728 | 4.778701 | 3 | 6 |
| C | 3.291413 | NA | 3.643191 | 3 | 6 |
| C | NA | NA | 3.516456 | 3 | 6 |
In the end, I suppose all I'm trying to do is match unique lot IDs and then just replicate the values from df.ids
for each of it's columns into df.testing
. I just don't know how (well, I thought of doing it with a for()
loop, but I'm sure there's a better way)!