0

I have the following data frame structure:

ID conception_date birth_date med_1 med_2 med_3 med_4 ... med_n
A xxxx xxxx 1 0 0 0 ...
A xxxx xxxx 0 1 0 0 ...
A xxxx xxxx 0 0 1 0 ...
B xxxx xxxx 1 0 0 0 ...
B xxxx xxxx 0 1 0 0 ...
B xxxx xxxx 0 0 1 0 ...
B xxxx xxxx 0 0 0 1 ...
C xxxx xxxx 1 0 0 0 ...
C xxxx xxxx 0 0 0 1 ...

I would like to group people by their ID, conception_date and birth_date in order to keep one line per person while summing medications per column per groupement. So the structure would become:

ID conception_date birth_date med_1 med_2 med_3 med_4 ... med_n
A xxxx xxxx 1 1 1 0
B xxxx xxxx 1 1 1 1
C xxxx xxxx 1 0 0 1
Greg
  • 3,054
  • 6
  • 27
Youknowme
  • 51
  • 6
  • Please provide better sample data and explain what you have tried thus far. – Chamkrai Jun 13 '23 at 13:41
  • Is the `N_person` row supposed to be a "total row"; and if so, why is it in the original, unsummarized dataset? And is `med_N` a rowwise sum of `med_1` through `med_4`, or is it just a placeholder, as if to say "there are `N` columns of the form `med_*`? – Greg Jun 13 '23 at 13:41
  • @Greg No N_person and N_med indicate that they go to Nth id and Nth medication as I have more than 4 medications and more than 3 IDs. I updated the data frame to include only 3 IDs and 4 medications to avoid confusion – Youknowme Jun 13 '23 at 13:46
  • `library(dplyr); your_data |> summarize(across(everything(), sum), .by = c(ID, conception_date, birth_date))` – Gregor Thomas Jun 13 '23 at 13:47

0 Answers0