group by and then expand to add new columns

Question

I have data that looks something like:

patientid <- c(100,101,101,101,102,102)
weight <- c(1,1,2,3,1,2)
height <- c(0,6,0,0,0,1)
bmi <- c(0,5,0,0,0,1)

I want to group patient id so that there is only 1 patient per row in the dataframe.
Then put the other rows as additional columns (named by adding a number on the end). So the dataframe would be patientid, weight1, height1, bmi1, weight2, height2, bmi2, etc. The number of columns would correspond to how many repeated patient ids there were.

I assume group_by and spread are the key functions but I can't figure it out. In this example the row with patient id 101 would just have values in columns height1, bmi1 and weight1, patient 101 would have values in weight1, height1, bmi1, weight2, height2, bmi2, weight3, height3, bmi3 and patient 102 would have values in weight1, height1, bmi1, weight2, height2, bmi2.

score 4 · Answer 1 · answered Aug 06 '21 at 15:09

A base R option using ave + reshape

reshape(
  transform(
    df,
    q = ave(patientid, patientid, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "patientid",
  timevar = "q"
)

gives

  patientid weight.1 height.1 bmi.1 weight.2 height.2 bmi.2 weight.3 height.3
1       100        1        0     0       NA       NA    NA       NA       NA
2       101        1        6     5        2        0     0        3        0
5       102        1        0     0        2        1     1       NA       NA
  bmi.3
1    NA
2     0
5    NA

score 3 · Answer 2 · edited Aug 06 '21 at 16:26

Perhaps, we can use pivot_wider after creating a sequence column by 'patientid'

library(tidyr)
library(data.table)
library(dplyr)
df1 %>% 
    mutate(rn  = rowid(patientid)) %>% 
    pivot_wider(names_from = rn, values_from = c(weight, height, bmi),
         names_sep="")

Output:

# A tibble: 3 x 10
  patientid weight1 weight2 weight3 height1 height2 height3  bmi1  bmi2  bmi3
      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl>
1       100       1      NA      NA       0      NA      NA     0    NA    NA
2       101       1       2       3       6       0       0     5     0     0
3       102       1       2      NA       0       1      NA     0     1    NA

Data:

df1 <- data.frame(patientid, weight, height, bmi)

score 1 · Answer 3 · answered Aug 06 '21 at 16:16

group_by and spread would part of the tidyverse, I think.

I reshaped your data with base reshape and used weight as measurement id.


patientid <- c(100,101,101,101,102,102)
weight <- c(1,1,2,3,1,2)
height <- c(0,6,0,0,0,1)
bmi <- c(0,5,0,0,0,1)

cat("data\n")
df <- data.frame(patientid = patientid,
                 n = weight,
                 weight = weight,
                 height = height,
                 bmi = bmi)
df

cat("reshaped to wid format\n")
reshape(data = df,
        idvar = "patientid",
        timevar = "n",
        # c("weight", "height", "bmi"),
        direction = "wide")

#?reshape()

Hernando please add credit to Lisa from Sololearn. you copied the question there abr repasted answer here. — user9507356, Aug 10 '21 at 17:29

group by and then expand to add new columns

3 Answers3

Output:

Data: