I am currently changing from Stata to R and would appreciate some help with the following problem.
I am analyzing different treatment duration (in months) and am trying to generate a dummy variable for each month where 0=no treatment and 1=treatment.
I have variables with the total number of months in treatment for each treatment episode (dur_t) and variables for the time between treatment episodes (dur_nt). Here are some random data that looks sort of what I have (I can't share mine).
set.seed(10000)
id <- 1:100
dur_t1 <- round(runif(n = 100, min = 1, max = 12),0)
dur_nt1 <- round(runif(n = 100, min = 1, max = 12),0)
dur_t2 <- round(runif(n = 100, min = 1, max = 12),0)
df <- data.frame(id,dur_t1,dur_nt1,dur_t2)
df$dur_nt1 <- na_if(df$dur_nt1, 7)
df$dur_nt1 <- na_if(df$dur_nt1, 3)
df$dur_t2 <- na_if(df$dur_t2, 11)
df$dur_t2 <- na_if(df$dur_t2, 5)
df$dur_t2[is.na(df$dur_nt1)] <- NA
So my data looks something like this:
id | dur_t1 | dur_nt1 | dur_t2 |
---|---|---|---|
1 | 1 | 0 | 5 |
2 | 3 | 3 | 2 |
3 | 1 | NA | NA |
4 | 2 | 2 | 2 |
5 | 5 | 2 | 1 |
And I would like to have something like this:
id | dur_t1 | dur_nt1 | dur_t2 | month1 | month2 | month3 | month4 | month5 | month6 | month7 | month8 |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0 | 5 | 1 | 1 | 1 | 1 | 1 | 1 | NA | NA |
2 | 3 | 3 | 2 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |
3 | 1 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA |
4 | 2 | 2 | 2 | 1 | 1 | 0 | 0 | 1 | 1 | NA | NA |
5 | 5 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 |
As you can see in the table:
- First row finished their first treatment episode within 1 month, therefore the variable month1=1. The individual started a new treatment during the same month, thus the duration of no treatment variable (dur_nt1) equals 0 and no replacement is done. Later, the case started a second treatment for 5 months (dur_t2=5), so month2-month6 are replaced with a 1 in each column. Finally, month7 (onward) should be "NA" for that case.
- Second row dur_t1=3, therefore month1-month3 are coded 1. The no treatment variable (dur_nt1) equals 3, therefore month4-month6 are coded 0, lastly dur_t2=2 and month7-month8 are coded as 1.
- Third row just had 1 treatment episode for 1 month (dur_t1=1). Thus after that it has only NA en each month.
And so on. I have 85000 observations in my dataset.
Thanks in advance!!