0

Let's say I have this df (but with thousand of IDs and tens of treatments)

df = data.frame(ID = c(1,1,1,2,2,2,2,2),
                treatment = c('AB','CD','EF','AB','CD','GH','IM','LN') )

I wish to obtain the following output

   ID treatment_1 treatment_2 treatment_3 treatment_4 treatment_5
1  1          AB          CD          EF        <NA>        <NA>
2  2          AB          CD          GH          IM          LN

What I don't want

    ID   .by Treatment_1 Treatment_2 Treatment_3 Treatment_4 Treatment_5 Treatment_6 Treatment_7 Treatment_8
  <dbl> <dbl> <chr>       <chr>       <chr>       <chr>       <chr>       <chr>       <chr>       <chr>      
1     1     1 AB          CD          EF          NA          NA          NA          NA          NA         
2     2     2 NA          NA          NA          AB          CD          GH          IM          LN  
Anas116
  • 797
  • 2
  • 9
  • 1
    `library(dplyr); library(tidyr); df %>% mutate(rown = row_number(), .by = ID) %>% pivot_wider(names_from = "rown", values_from = "treatment", names_prefix = "Treatment_")` – Maël Jul 27 '23 at 12:13
  • 2
    Switching from long to wide format is probably going to make your analysis more difficult here. Are you _sure_ you need to switch to that format? Might you even be better with a column for each treatment, indicating whether the subject got that treatment? Something like `table(ID = df$ID, treatment = df$treatment) == 1`? – Allan Cameron Jul 27 '23 at 12:17
  • unfortunately no it is not gonna work that way because there are like a hundred different categories of my variable – Anas116 Jul 27 '23 at 12:21
  • @Maël it is still not what I am looking for because the output of your function is not exactly the output i am looking for – Anas116 Jul 27 '23 at 12:22
  • This question shouldn’t be closed because it’s not like the other question – Anas116 Jul 27 '23 at 12:41
  • 2
    How is the output different? The code Mael provided produces output that looks to me like your desired output. – MrFlick Jul 27 '23 at 12:49
  • I've added three other duplicate links that are essentially the same: create a row number by group, and pivot. That's why this is a duplicate – Maël Jul 27 '23 at 12:53
  • no it is not because it considers each treatment as a different column. What I want is that patient 1 treatment 1 is AB and patient 2 Treatment 1 is AB. In the code of Maël : patient 1 treatment 1 is AB and patient 2 Treatment 1 is NA – Anas116 Jul 27 '23 at 12:57
  • 1
    You should update `dplyr` and try again. The code is correct but your version of `dplyr` is out of date and doesn't recognise the `.by` argument to do grouped calculations. Or use the older `group_by()` function. – Ritchie Sacramento Jul 27 '23 at 13:14
  • I have the 1.0.10 version. I tried using the argument group_by and the function group_by, and still did not work. anyway will see when I have an updated version hopefully will work thanks all of you. – Anas116 Jul 27 '23 at 14:43
  • I finally found a solution to the problem since I was not able to update the package. It would be : ```df%>%group_by(ID)%>%mutate(rown = row_number())%>% pivot_wider(names_from = "rown", values_from = "treatment", names_prefix = "Treatment_")``` – Anas116 Jul 27 '23 at 15:02

0 Answers0