0

My data look like this

Person ID Treatment code
A 00001
A 00002
A 00003
B 00002
B 00012
C 00003
C 00015

Each patient can have an infinite number of treatment. What I would like is a line for each patient with as much columns as the maximum number of treatment a patient can have. And, in each column of treatment, the corresponding code.

Pateint Id Treatment 1 Treatment 2 Treatment 3
A 00001 00002 00003
B 00002 00012 missing
C 00003 00015 missing

I used to do this very easily in Stata but I am now struggling with R. I have tried to use the tidyr package and the pivot_wider function but did not succeed.

Thanks a lot for your help!

Robisch
  • 3
  • 1

1 Answers1

0

While this could look like a clear duplicate of How to reshape data from long to wide format, there is a missing link. Before we start pivoting we need to obtain the row number per group.

E.g.

library(dplyr)
library(tidyr)

df |> 
  group_by(PersonID) |> 
  mutate(name = row_number()) |> 
  ungroup() |> 
  pivot_wider(id_cols = "PersonID", 
              values_from = "Treatmentcode",
              names_prefix = "Treatment")

Output:

# A tibble: 3 × 4
  PersonID Treatment1 Treatment2 Treatment3
  <chr>    <chr>      <chr>      <chr>     
1 A        00001      00002      00003     
2 B        00002      00012      NA        
3 C        00003      00015      NA       

Data:

library(readr)

df <- read_table("PersonID  Treatmentcode
A   00001
A   00002
A   00003
B   00002
B   00012
C   00003
C   00015") 
harre
  • 7,081
  • 2
  • 16
  • 28
  • It has worked, thanks. I actually missed the part where the data are first grouped by person ID. However. I am not sure to understand how the ```name``` variable is considered in ```pivot_wider()```? – Robisch Sep 01 '22 at 16:01
  • It's the standard parameter for the`names_from`-argument. – harre Sep 01 '22 at 17:15