I have a dataset in a one-row-per-person format, but I want to merge in a dataset as a matrix variable -- basically, if not for the need of the tidy format, I would use a fuzzy join to merge in data from a range of dates, creating n x 3 rows. Instead, I really want to merge them in as a matrix variable.
I've tried using the fuzzy join followed by a nest function, but my merge dataset contains several variables, so needing to explicitly state my variables for the nest function would be a ton of work and ultimately look messy (though, I suppose if this is my only way forward, I could commit to this).
By way of providing a reproducible example of what I am thinking, I've created the fake dataset below:
person_data <- tibble(ID=paste0("ID", 1:50), outcome=sample(0:1, 50, T), date=sample(5:145, 50, F),min_date=date-5)
value_data <- tibble(date=seq(1:150), exp1=sample(20:100, 150, T), exp2=sample(20:100, 150, T))
My first thought was to combine fuzzy join and the nest function like so:
merged <- fuzzyjoin::fuzzy_left_join(person_data, value_data, by=c("date" = "date", "min_date" = "date"), match_fun=c(`>=`,`<=`))
analysis_data <- merged %>%
select(-date.y) %>%
group_by(ID) %>%
nest(exp1=exp1, exp2=exp2)
The head of the resulting analysis_data
tibble is here.
> analysis_data
# A tibble: 50 × 6
# Groups: ID [50]
ID outcome date.x min_date exp1 exp2
<chr> <int> <int> <dbl> <list> <list>
1 ID1 1 39 34 <tibble [6 × 1]> <tibble [6 × 1]>
2 ID2 1 87 82 <tibble [6 × 1]> <tibble [6 × 1]>
3 ID3 0 23 18 <tibble [6 × 1]> <tibble [6 × 1]>
4 ID4 1 60 55 <tibble [6 × 1]> <tibble [6 × 1]>
5 ID5 1 120 115 <tibble [6 × 1]> <tibble [6 × 1]>
6 ID6 0 35 30 <tibble [6 × 1]> <tibble [6 × 1]>
7 ID7 0 131 126 <tibble [6 × 1]> <tibble [6 × 1]>
8 ID8 0 83 78 <tibble [6 × 1]> <tibble [6 × 1]>
9 ID9 0 68 63 <tibble [6 × 1]> <tibble [6 × 1]>
10 ID10 1 133 128 <tibble [6 × 1]> <tibble [6 × 1]>
And for the most part, it kind of does get me there because the resulting data has 50 rows but is there a better/simpler way of approaching this problem, especially if I had exp1-50 instead of exp1 and exp2? I'd also like to add an index
matrix variable, but I'm unsure how to add that.
Realizing it doesn't fit with the rest of my examples, I basically want a dataset that looks like this in the end:
fake_data_goal <- tibble(ID=paste0("ID", 1:50), outcome=sample(0:1, 50, T), var=matrix(data=sample(50:100, 50*3, T), nrow=50), index=matrix(rep(c(1:3), 50), nrow=50, byrow=T))
> fake_data_goal
# A tibble: 50 × 4
ID outcome var[,1] [,2] [,3] index[,1] [,2] [,3]
<chr> <int> <int> <int> <int> <int> <int> <int>
1 ID1 0 66 90 71 1 2 3
2 ID2 0 95 98 75 1 2 3
3 ID3 0 57 84 91 1 2 3
4 ID4 1 78 89 64 1 2 3
5 ID5 1 97 73 95 1 2 3
6 ID6 0 52 96 76 1 2 3
7 ID7 0 62 93 57 1 2 3
8 ID8 0 62 76 81 1 2 3
9 ID9 1 55 58 67 1 2 3
10 ID10 0 81 91 91 1 2 3
Thank you so much in advance!