2

I have data that looks something like:

patientid <- c(100,101,101,101,102,102)
weight <- c(1,1,2,3,1,2)
height <- c(0,6,0,0,0,1)
bmi <- c(0,5,0,0,0,1)

I want to group patient id so that there is only 1 patient per row in the dataframe.
Then put the other rows as additional columns (named by adding a number on the end). So the dataframe would be patientid, weight1, height1, bmi1, weight2, height2, bmi2, etc. The number of columns would correspond to how many repeated patient ids there were.

I assume group_by and spread are the key functions but I can't figure it out. In this example the row with patient id 101 would just have values in columns height1, bmi1 and weight1, patient 101 would have values in weight1, height1, bmi1, weight2, height2, bmi2, weight3, height3, bmi3 and patient 102 would have values in weight1, height1, bmi1, weight2, height2, bmi2.

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81

3 Answers3

4

A base R option using ave + reshape

reshape(
  transform(
    df,
    q = ave(patientid, patientid, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "patientid",
  timevar = "q"
)

gives

  patientid weight.1 height.1 bmi.1 weight.2 height.2 bmi.2 weight.3 height.3
1       100        1        0     0       NA       NA    NA       NA       NA
2       101        1        6     5        2        0     0        3        0
5       102        1        0     0        2        1     1       NA       NA
  bmi.3
1    NA
2     0
5    NA
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
3

Perhaps, we can use pivot_wider after creating a sequence column by 'patientid'

library(tidyr)
library(data.table)
library(dplyr)
df1 %>% 
    mutate(rn  = rowid(patientid)) %>% 
    pivot_wider(names_from = rn, values_from = c(weight, height, bmi),
         names_sep="")

Output:

# A tibble: 3 x 10
  patientid weight1 weight2 weight3 height1 height2 height3  bmi1  bmi2  bmi3
      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl>
1       100       1      NA      NA       0      NA      NA     0    NA    NA
2       101       1       2       3       6       0       0     5     0     0
3       102       1       2      NA       0       1      NA     0     1    NA

Data:

df1 <- data.frame(patientid, weight, height, bmi)
Hernando Abella
  • 286
  • 2
  • 13
akrun
  • 874,273
  • 37
  • 540
  • 662
1

group_by and spread would part of the tidyverse, I think.

I reshaped your data with base reshape and used weight as measurement id.


patientid <- c(100,101,101,101,102,102)
weight <- c(1,1,2,3,1,2)
height <- c(0,6,0,0,0,1)
bmi <- c(0,5,0,0,0,1)

cat("data\n")
df <- data.frame(patientid = patientid,
                 n = weight,
                 weight = weight,
                 height = height,
                 bmi = bmi)
df

cat("reshaped to wid format\n")
reshape(data = df,
        idvar = "patientid",
        timevar = "n",
        # c("weight", "height", "bmi"),
        direction = "wide")

#?reshape()
Hernando Abella
  • 286
  • 2
  • 13