How to convert multiple rows per ID to Single row Per ID in R?

Question

I have a Patient Data of 9 months. The dataset includes 4 columns - Patient ID, Visit date, Systolic Blood Pressure (SBP), and Diastolic Blood Pressure (DBP). The patient ID is Unique to everyone. Each Patient has visited the clinic multiple times. Whenever the patient visited their SBP and DBP count was taken.

Sample dataset :

structure(list(PAT_ID = c("PAT134844", "PAT134845", "PAT122062", "PAT134846", "PAT110934", "PAT134844","PAT134845", "PAT134846", "PAT110934", "PAT134846", "PAT122062", "PAT134845", "PAT134844"), 
               VISIT.DATE = c("1/2/19", "1/2/19", "1/2/19", "1/3/19", "1/3/19", "1/4/19", "3/5/19", "4/5/19", "4/5/19", "6/7/19", "6/7/19", "8/8/19", "8/8/19"),
               BP_SYSTOLIC = c("108", "122", "126", "125", "167", "125", "111", "115", "120", "130", "125", "135", "110"),
               BP_DIASTOLIC = c("70", "82", "77", "85", "89", "71", "78", "75", "89", "64", "61", "68", "77")), class = "data.frame", row.names = c(NA, -13L))

I want my outcome table to be -

Patient ID	Date 1	SBP 1	DBP 1	Date 2	SBP 2	DBP 2	Date 3	so on..
CH1234	1 january	120	80	5 may	115	60	7 july
CH1235	2 january	130	90	6 june	110	70	8 august

Welcome to SO, Ray! (1) Stack's tag-recommendation system is imperfect, please verify the suggested tags are relevant. Here, [tag:rstudio] is reserved for the IDE itself regardless of the language; and [tag:excel] seems informative that you originall looked at the data in excel but does not change anything from the R side. (2) It helps immensely to have a reproducible question including sample data and code attempted. See https://stackoverflow.com/q/5963269 , [mcve], and https://stackoverflow.com/tags/r/info for discussions of (e.g.) `dput(head(x))` and similar. — r2evans, Jun 07 '23 at 20:03
(3) If I'm reading this correctly, then this seems like a reshape/pivot operation, where https://stackoverflow.com/q/2185252/3358272, https://stackoverflow.com/q/68058000/3358272 would be helpful. Good luck! — r2evans, Jun 07 '23 at 20:04
@r2evans Thank You so much! The reshape/pivot operation is not helping. I am new to the R. Is it possible for you to send me code of how it works. I tried transposing but the outcome is not what I want. — Ray, Jun 07 '23 at 20:21
Still (2), sample data please. (3) What did you try? It's often better to resolve the code issues you show, otherwise it seems as if this is a "free code service" (which it is not). — r2evans, Jun 07 '23 at 20:24
@r2evans Please find sample dataset in the question above. 3) I tried library(data.table ) and used transpose function. t_newdata <- transpose(newdata) View(t_newdata) It just rotated the dataset that's it. — Ray, Jun 07 '23 at 21:14

score 0 · Answer 1 · answered Jun 07 '23 at 21:30

The following code produces the columns you're looking for, assuming the dat is your data:

library(tidyverse)
dat |> 
  group_by(PAT_ID) |> 
  arrange(VISIT.DATE) |> 
  mutate(num_visit = paste0("Date: ", row_number())) |> 
  pivot_wider(names_from = num_visit,
              values_from = c(VISIT.DATE, BP_SYSTOLIC, BP_DIASTOLIC)) |> 
  ungroup()

which gives:

# A tibble: 5 × 10
# Groups:   PAT_ID [5]
  PAT_ID    `VISIT.DATE_Date: 1` `VISIT.DATE_Date: 2` `VISIT.DATE_Date: 3` `BP_SYSTOLIC_Date: 1` `BP_SYSTOLIC_Date: 2`
  <chr>     <chr>                <chr>                <chr>                <chr>                 <chr>                
1 PAT134844 1/2/19               1/4/19               8/8/19               108                   125                  
2 PAT134845 1/2/19               3/5/19               8/8/19               122                   111                  
3 PAT122062 1/2/19               6/7/19               NA                   126                   125                  
4 PAT134846 1/3/19               4/5/19               6/7/19               125                   115                  
5 PAT110934 1/3/19               4/5/19               NA                   167                   120                  
# ℹ 4 more variables: `BP_SYSTOLIC_Date: 3` <chr>, `BP_DIASTOLIC_Date: 1` <chr>, `BP_DIASTOLIC_Date: 2` <chr>,
#   `BP_DIASTOLIC_Date: 3` <chr>

You can reorder/rename the columns if you want.

score 0 · Answer 2 · answered Jun 07 '23 at 23:10

An approach using aggregate to group the data and unnest_wider from tidyr to get the wide format, followed by order to get the desired arrangement

library(dplyr)
library(tidyr)

res <- aggregate(. ~ PAT_ID, df, c) %>% 
  unnest_wider(-PAT_ID, names_sep="_")

cbind(res[1], res[-1][order(sub(".*_(\\d+)", "\\1", colnames(res)[-1]))])
     PAT_ID VISIT.DATE_1 BP_SYSTOLIC_1 BP_DIASTOLIC_1 VISIT.DATE_2
1 PAT110934       1/3/19           167             89       4/5/19
2 PAT122062       1/2/19           126             77       6/7/19
3 PAT134844       1/2/19           108             70       1/4/19
4 PAT134845       1/2/19           122             82       3/5/19
5 PAT134846       1/3/19           125             85       4/5/19
  BP_SYSTOLIC_2 BP_DIASTOLIC_2 VISIT.DATE_3 BP_SYSTOLIC_3 BP_DIASTOLIC_3
1           120             89         <NA>          <NA>           <NA>
2           125             61         <NA>          <NA>           <NA>
3           125             71       8/8/19           110             77
4           111             78       8/8/19           135             68
5           115             75       6/7/19           130             64

Hey! Thank You so much! It worked. Can you help me explain what exactly the code ``cbind(res[1], res[-1][order(sub(".*_(\\d+)", "\\1", colnames(res)[-1]))])`` did? I am new to the R. My dataset "res" does look like this in console but when I view it, it looks like this "Pat_ID", Visit date_1,Visit date_2 ,Visit date_3 ,Visit date_4, BP_Systolic_1, BP_Systolic_2,BP_Systolic_3,BP_Systolic_4,BP_Systolic_5, BP_Diastolic_1,BP_Diastolic_2,BP_Diastolic_3,BP_Diastolic_4,BP_Diastolic_5,BP_Diastolic_6,BP_Diastolic_7" etc. How can I reorder in the outcome expect it to be? — Ray, Jun 08 '23 at 18:03
@Ray Sure. `sub` removes everything but the numbers (\\d+) that were added by `unnest`. `order` gets the sorted indices and applies them to `res` without "PAT_ID" (that's the -1). Finally combine "PAT_ID" and "res" to get the desired output. Btw, you have to assign it to a new (or the old "res" if you wish) variable to save it, e.g. `res <- cbind(res[1], res[-1][order...`. — Andre Wildberg, Jun 08 '23 at 19:00
@Ray Great, glad it helped! As a last step you can choose between the answers and accept one that suits best. — Andre Wildberg, Jun 10 '23 at 16:27

score 0 · Answer 3 · answered Jun 08 '23 at 02:40

This is close to a dupe of How to reshape data from long to wide format and Reshape multiple value columns to wide format, though neither answers have multi-column names_from=.

dplyr

This is reshaping from long to wide.

library(dplyr)
library(tidyr)
quux %>%
  group_by(PAT_ID) %>%
  mutate(rn = row_number()) %>%
  ungroup() %>%
  pivot_wider(id_cols = PAT_ID, names_from = "rn",
              values_from = c("VISIT.DATE", "BP_SYSTOLIC", "BP_DIASTOLIC"), 
              names_sep = "_") %>%
  rename_with(.cols = matches("[0-9]"),
    .fn = function(z) case_when(
      grepl("VISIT", z) ~ gsub(".*_", "Date_", z), 
      grepl("SYSTOLIC", z) ~ gsub(".*_", "SBP_", z), 
      grepl("DIASTOLIC", z) ~ gsub(".*_", "DBP_", z))
  )
# # A tibble: 5 × 10
#   PAT_ID    Date_1 Date_2 Date_3 SBP_1 SBP_2 SBP_3 DBP_1 DBP_2 DBP_3
#   <chr>     <chr>  <chr>  <chr>  <chr> <chr> <chr> <chr> <chr> <chr>
# 1 PAT134844 1/2/19 1/4/19 8/8/19 108   125   110   70    71    77   
# 2 PAT134845 1/2/19 3/5/19 8/8/19 122   111   135   82    78    68   
# 3 PAT122062 1/2/19 6/7/19 NA     126   125   NA    77    61    NA   
# 4 PAT134846 1/3/19 4/5/19 6/7/19 125   115   130   85    75    64   
# 5 PAT110934 1/3/19 4/5/19 NA     167   120   NA    89    89    NA

data.table

library(data.table)
dcast(as.data.table(quux)[, n := seq(.N), PAT_ID], PAT_ID ~ n, value.var = c("VISIT.DATE", "BP_SYSTOLIC", "BP_DIASTOLIC"))
#       PAT_ID VISIT.DATE_1 VISIT.DATE_2 VISIT.DATE_3 BP_SYSTOLIC_1 BP_SYSTOLIC_2 BP_SYSTOLIC_3 BP_DIASTOLIC_1
#       <char>       <char>       <char>       <char>        <char>        <char>        <char>         <char>
# 1: PAT110934       1/3/19       4/5/19         <NA>           167           120          <NA>             89
# 2: PAT122062       1/2/19       6/7/19         <NA>           126           125          <NA>             77
# 3: PAT134844       1/2/19       1/4/19       8/8/19           108           125           110             70
# 4: PAT134845       1/2/19       3/5/19       8/8/19           122           111           135             82
# 5: PAT134846       1/3/19       4/5/19       6/7/19           125           115           130             85
# 2 variables not shown: [BP_DIASTOLIC_2 <char>, BP_DIASTOLIC_3 <char>]

and you can rename similar to above.

Data

quux <- structure(list(PAT_ID = c("PAT134844", "PAT134845", "PAT122062", "PAT134846", "PAT110934", "PAT134844", "PAT134845", "PAT134846", "PAT110934", "PAT134846", "PAT122062", "PAT134845", "PAT134844"), VISIT.DATE = c("1/2/19", "1/2/19", "1/2/19", "1/3/19", "1/3/19", "1/4/19", "3/5/19", "4/5/19", "4/5/19", "6/7/19", "6/7/19", "8/8/19", "8/8/19"), BP_SYSTOLIC = c("108", "122", "126", "125", "167", "125", "111", "115", "120", "130", "125", "135", "110"), BP_DIASTOLIC = c("70", "82", "77", "85", "89", "71",  "78", "75", "89", "64", "61", "68", "77")), class = "data.frame", row.names = c(NA, -13L))

Heya! Thank you so much for the code! Both the codes work! But I don't know why I can't View or print the data outcome which is available in the console. When I use View() function it provides or opens the raw data file. — Ray, Jun 09 '23 at 01:28
Yes, I did assign it to the new object. I guess there was some glitch. It is working now! Thank you so much. I have one more question. Both the results have visit dates first followed by BP Systolic, and BP DIastolic. I want to have it like Visit date 1 followed by BP systolic 1 and BP Diastolic 1, Visit date 2, SBP 2, DBP 2 etc. How do I reorder the entire dataset in easy way? Because I have 25,575 observations rows and 109 variables! — Ray, Jun 10 '23 at 16:04
Nothing easy/fancy. If we assign the dplyr output above to `out`, and create `cn <- colnames(out)`, then we can do `out[,cn[order(cn != "PAT_ID", suppressWarnings(as.integer(gsub(".*[^0-9]([0-9]+)$", "\\1", cn))))]]` to do the order you said. — r2evans, Jun 10 '23 at 17:43
It did work but in console. When I view my data using View() it's still the same. But in console I can see the desired output. I am curious to know why is it so? Thank you so much for your constant support. — Ray, Jun 10 '23 at 18:48
Yes I did assign. I think there was some glitch. Had to download the latest version of R. Sir! THANK YOU SO MUCH!! you have helped me a lot. You made so easy for me. I have similar thing to do with 1 year patient data. Thank you so much. I appreciate your help and constant support. Also thank you for teaching me new things too. — Ray, Jun 10 '23 at 22:55

How to convert multiple rows per ID to Single row Per ID in R?

3 Answers3

dplyr

data.table