How do I create a long data frame for my data?

Question

First of all, thank you to whoever takes time out of their day to assist me. I'm basically having this issue where I have data on individuals' medical visits ( I have information on 4 visits per individual) I want to study the effect of 3 of these study variables but to do that I have to convert the data into a long format in a specific manner. Take for example the blood pressure variable (BP). This variable was also measured at each follow-up visit and as a result, I have columns labeled BP_fu1, BP_fu2, etc... I would like to only have one column for each variable that I have to follow up information on, for example; only BP column, only one Heart Rate column, and only one memory score column. Additionally, I want to create a follow-up column that has a follow-up number that corresponds to the correct variable. No matter what I do to try to create this data set I cannot get all 3 variables to match the correct follow-up numbers.

I have tried using the pivot longer function on each variable one at a time (first on all 3 memory variables, then on the heart rate variable, and lastly on the BP variable). here's some code I'm using I just changed the variable names:

df_TIME <- df %>%
  mutate(time_years = as.numeric(HR_FU1) - as.numeric(HR_FU3))

df_long_final <- df_TIME %>%
       mutate(across(starts_with("COMP_MEM"), as.numeric)) %>%
       pivot_longer(cols = starts_with("COMP_MEM"), names_to = "follow_up", values_to = "comp_mem") %>%
       mutate(follow_up = gsub("^COMP_MEM_FU", "", follow_up))
 
# Output dataframe
df_long_final$CRP[df_long_final$BP=="-96"] <- NA

df_long_final$CRP[df_long_final$BP=="-99"] <- NA

df_long_final$CRP[df_long_final$BP=="83"] <- NA

df_long_final1 <- df_long_final %>%
  pivot_longer(cols = starts_with("HR_FU"),
               names_to = "temp",
               values_to = "HR_FU",
               names_pattern = "HR_FU(\\d+)") %>%
  select(-temp) %>%
  arrange(famnr, follow_up)
#
#
df_long_final1 <- df_long_final %>%
  pivot_longer(cols = starts_with("BP"),
               names_to = "temp",
               values_to = "BP",
               names_pattern = "BP(\\d+)") %>%
  select(-temp) %>%
  arrange(famnr, follow_up)

Edit per suggestion im adding this sample of my data in code block.

   > dput(head(df))
structure(list(COMP_MEM_FU1 = c(-0.1278462, 0.651491203, -0.523100556, 
-0.577777305, -1108359738, -0.623475318), COMP_MEM_FU2 = c("-0.08989273", 
"0.89857704", "-0.073899931", "0.15776524", NA, NA), COMP_MEM_FU3 = c("-0.10930318", 
"0.033529036", "0.116955388", "-0.356199591", NA, NA), bp_FU1 = c(1, 
1, 1, 1, 3, 1), bp_FU2 = c(1, 1, 1, 1, NA, NA), bp_FU3 = c(1.1, 
0.5, 0.4, 1, NA, NA), AGE = c("71.0", "71.0", "65.5", "65.5", 
"78.1", "78.1"), heart_rate = c(70.9, 70.9, 65.5, 65.5, 77.9, 
77.9), heart_rate_FU1 = c(73.1, 73.1, 67.7, 67.7, 80.2, 80.2), 
   heart_rate_FU2 = c(75.3, 75.3, 69.9, 69.9, NA, NA), heart_rate_FU3 = c(77.7, 
    77.7, 72.3, 72.3, NA, NA), GENDER = c(0, 0, 1, 1, 1, 1), 
    EDU_YEARS = c(13, 17, 9, 9, 8, 9), famnr = c(1, 1, 
    2, 2, 3, 3), binding = c("0.116957518", "0.134414065", 
    "0.040922909", "0.058799312", "0.273736362", "0.065468945"
    ), id = c("id_301", "id_302", "id_303", "id_304", "id_305", 
    "id_306"), Twin_Number = c(1, 2, 1, 2, 1, 2)), row.names = c(NA, 
6L), class = "data.frame")

Please provide a sample of your data. See https://stackoverflow.com/q/5963269 , [mcve], and https://stackoverflow.com/tags/r/info for discussions on the use of `dput`, `data.frame`, or `read.table`. Thanks! — r2evans, May 26 '23 at 13:03
im not sure how to supply the samle of my data, this isnt really explained well — Nick Caruana, May 26 '23 at 13:24
run `dput(head(df))`, [edit] your question, and paste the output in a [code block] — r2evans, May 26 '23 at 13:33
thank you for the suggestion ive gone and edited the initial post to contain the output. is this sufficient or should i supply somehting else from my end? — Nick Caruana, May 26 '23 at 14:08
It's a good start, but ... (1) your first code block references `HR_FU1` (among others) that is not present in `df`, is that supposed to be `heartrate_FU1`? (2) `df_long_final$CRP` is not found. (3) Not entirely sure (since I can't go through the stepwise code) what the end result should look like. Are you able to provide a sample (`data.frame(..)`) resembling what the desired output is? — r2evans, May 26 '23 at 14:21

score 0 · Answer 1 · answered May 26 '23 at 14:46

library(tidyverse)

df <- structure(list(COMP_MEM_FU1 = c(-0.1278462, 0.651491203, -0.523100556, 
                                      -0.577777305, -1108359738, -0.623475318), COMP_MEM_FU2 = c("-0.08989273", 
                                                                                                 "0.89857704", "-0.073899931", "0.15776524", NA, NA), COMP_MEM_FU3 = c("-0.10930318", 
                                                                                                                                                                       "0.033529036", "0.116955388", "-0.356199591", NA, NA), bp_FU1 = c(1, 
                                                                                                                                                                                                                                         1, 1, 1, 3, 1), bp_FU2 = c(1, 1, 1, 1, NA, NA), bp_FU3 = c(1.1, 
                                                                                                                                                                                                                                                                                                    0.5, 0.4, 1, NA, NA), AGE = c("71.0", "71.0", "65.5", "65.5", 
                                                                                                                                                                                                                                                                                                                                  "78.1", "78.1"), heart_rate = c(70.9, 70.9, 65.5, 65.5, 77.9, 
                                                                                                                                                                                                                                                                                                                                                                  77.9), heart_rate_FU1 = c(73.1, 73.1, 67.7, 67.7, 80.2, 80.2), 
                     heart_rate_FU2 = c(75.3, 75.3, 69.9, 69.9, NA, NA), heart_rate_FU3 = c(77.7, 
                                                                                            77.7, 72.3, 72.3, NA, NA), GENDER = c(0, 0, 1, 1, 1, 1), 
                     EDU_YEARS = c(13, 17, 9, 9, 8, 9), famnr = c(1, 1, 
                                                                  2, 2, 3, 3), binding = c("0.116957518", "0.134414065", 
                                                                                           "0.040922909", "0.058799312", "0.273736362", "0.065468945"
                                                                  ), id = c("id_301", "id_302", "id_303", "id_304", "id_305", 
                                                                            "id_306"), Twin_Number = c(1, 2, 1, 2, 1, 2)), row.names = c(NA, 
                                                                                                                                         6L), class = "data.frame")
df |> 
  mutate(across(-id, as.numeric)) |> 
  pivot_longer(contains("FU")) |> 
  mutate(visit = parse_number(name), name = str_remove(name, "1|2|3")) |> 
  pivot_wider()
#> # A tibble: 18 × 12
#>      AGE heart_rate GENDER EDU_YEARS famnr binding id     Twin_Number visit
#>    <dbl>      <dbl>  <dbl>     <dbl> <dbl>   <dbl> <chr>        <dbl> <dbl>
#>  1  71         70.9      0        13     1  0.117  id_301           1     1
#>  2  71         70.9      0        13     1  0.117  id_301           1     2
#>  3  71         70.9      0        13     1  0.117  id_301           1     3
#>  4  71         70.9      0        17     1  0.134  id_302           2     1
#>  5  71         70.9      0        17     1  0.134  id_302           2     2
#>  6  71         70.9      0        17     1  0.134  id_302           2     3
#>  7  65.5       65.5      1         9     2  0.0409 id_303           1     1
#>  8  65.5       65.5      1         9     2  0.0409 id_303           1     2
#>  9  65.5       65.5      1         9     2  0.0409 id_303           1     3
#> 10  65.5       65.5      1         9     2  0.0588 id_304           2     1
#> 11  65.5       65.5      1         9     2  0.0588 id_304           2     2
#> 12  65.5       65.5      1         9     2  0.0588 id_304           2     3
#> 13  78.1       77.9      1         8     3  0.274  id_305           1     1
#> 14  78.1       77.9      1         8     3  0.274  id_305           1     2
#> 15  78.1       77.9      1         8     3  0.274  id_305           1     3
#> 16  78.1       77.9      1         9     3  0.0655 id_306           2     1
#> 17  78.1       77.9      1         9     3  0.0655 id_306           2     2
#> 18  78.1       77.9      1         9     3  0.0655 id_306           2     3
#> # ℹ 3 more variables: COMP_MEM_FU <dbl>, bp_FU <dbl>, heart_rate_FU <dbl>

^{Created on 2023-05-26 with reprex v2.0.2}

How do I create a long data frame for my data?

1 Answers1