I run into this scenario often:
I have a designed experiment where each experimental unit is assigned a treatment value, and I have a data frame depicting the experimental design (df.design). The final data file (df.data) may have different numbers of measurements, and only the ID of the experimental unit, not the treatment variables.
My goal is to be able to create a treatment variable in df.data, based on df.design
in other words df.data$treatment should equal df.design$treatment anywhere df.data$ID==df.design$ID even though length(df.data$ID) != length(df.design$ID)
Because this is a common scenario with very different data sets, I'm hoping for a universal or at least easily modify among scenarios.
I have tried solutions along the lines of:
df.data$treatment <- case_when(
df.data$ID[i] == df.design$ID[i] ~ df.design$treatment[i])
But this returns <NA>s
example df.desgin
> head(df.design)
ID treatment
1 1 No heat - Keep all foliage
2 2 Heat - Remove new foliage
3 3 No heat - Remove old foliage
4 4 No heat - Remove new foliage
5 5 Heat - Remove old foliage
6 6 Heat - Keep all foliage
Example df.data:
> head(df.data)
obs ID subsample A
1 1 1 New 1.3
2 13 2 New 3.3
3 12 2 Mature 1.1
4 14 3 Mature 3.8
5 15 4 Mature 3.4
6 16 5 Mature 2.0
You'll see that some, but not all IDs have multiple measurements ("A"). These measurements come from an instrument that produces the data with labels "obs", "ID", and "subsample".