First of all, thank you to whoever takes time out of their day to assist me. I'm basically having this issue where I have data on individuals' medical visits ( I have information on 4 visits per individual) I want to study the effect of 3 of these study variables but to do that I have to convert the data into a long format in a specific manner. Take for example the blood pressure variable (BP). This variable was also measured at each follow-up visit and as a result, I have columns labeled BP_fu1, BP_fu2, etc... I would like to only have one column for each variable that I have to follow up information on, for example; only BP column, only one Heart Rate column, and only one memory score column. Additionally, I want to create a follow-up column that has a follow-up number that corresponds to the correct variable. No matter what I do to try to create this data set I cannot get all 3 variables to match the correct follow-up numbers.
I have tried using the pivot longer function on each variable one at a time (first on all 3 memory variables, then on the heart rate variable, and lastly on the BP variable). here's some code I'm using I just changed the variable names:
df_TIME <- df %>%
mutate(time_years = as.numeric(HR_FU1) - as.numeric(HR_FU3))
df_long_final <- df_TIME %>%
mutate(across(starts_with("COMP_MEM"), as.numeric)) %>%
pivot_longer(cols = starts_with("COMP_MEM"), names_to = "follow_up", values_to = "comp_mem") %>%
mutate(follow_up = gsub("^COMP_MEM_FU", "", follow_up))
# Output dataframe
df_long_final$CRP[df_long_final$BP=="-96"] <- NA
df_long_final$CRP[df_long_final$BP=="-99"] <- NA
df_long_final$CRP[df_long_final$BP=="83"] <- NA
df_long_final1 <- df_long_final %>%
pivot_longer(cols = starts_with("HR_FU"),
names_to = "temp",
values_to = "HR_FU",
names_pattern = "HR_FU(\\d+)") %>%
select(-temp) %>%
arrange(famnr, follow_up)
#
#
df_long_final1 <- df_long_final %>%
pivot_longer(cols = starts_with("BP"),
names_to = "temp",
values_to = "BP",
names_pattern = "BP(\\d+)") %>%
select(-temp) %>%
arrange(famnr, follow_up)
Edit per suggestion im adding this sample of my data in code block.
> dput(head(df))
structure(list(COMP_MEM_FU1 = c(-0.1278462, 0.651491203, -0.523100556,
-0.577777305, -1108359738, -0.623475318), COMP_MEM_FU2 = c("-0.08989273",
"0.89857704", "-0.073899931", "0.15776524", NA, NA), COMP_MEM_FU3 = c("-0.10930318",
"0.033529036", "0.116955388", "-0.356199591", NA, NA), bp_FU1 = c(1,
1, 1, 1, 3, 1), bp_FU2 = c(1, 1, 1, 1, NA, NA), bp_FU3 = c(1.1,
0.5, 0.4, 1, NA, NA), AGE = c("71.0", "71.0", "65.5", "65.5",
"78.1", "78.1"), heart_rate = c(70.9, 70.9, 65.5, 65.5, 77.9,
77.9), heart_rate_FU1 = c(73.1, 73.1, 67.7, 67.7, 80.2, 80.2),
heart_rate_FU2 = c(75.3, 75.3, 69.9, 69.9, NA, NA), heart_rate_FU3 = c(77.7,
77.7, 72.3, 72.3, NA, NA), GENDER = c(0, 0, 1, 1, 1, 1),
EDU_YEARS = c(13, 17, 9, 9, 8, 9), famnr = c(1, 1,
2, 2, 3, 3), binding = c("0.116957518", "0.134414065",
"0.040922909", "0.058799312", "0.273736362", "0.065468945"
), id = c("id_301", "id_302", "id_303", "id_304", "id_305",
"id_306"), Twin_Number = c(1, 2, 1, 2, 1, 2)), row.names = c(NA,
6L), class = "data.frame")