0

I have a dataset of 1000 patients (Patient 1-1000) undergoing the same type of procedure but by diffrent surgeons. I am interested to count the number of operations performed by each surgeon from study start date 27/06/2023 (for example) before the next procedure and insert this count/number per each patient/row. For example I need to know how many operations surgeon A performed on previous patients (1 and 2) before operating on patient 3 (e.g. 2 operation).Same for surgeon B etc.

I gues there is some formula in dplyr but I cannot get my head around it.

Image of dataset and what I need to get

Patient Surgeon Operation Date  Event before index (operation date)
1   A   28/06/2023  0
2   A   29/06/2023  1
3   A   30/06/2023  2
4   B   1/07/2023   0
5   C   2/07/2023   1
6   C   3/07/2023   2
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 1
    Hi Daniel! What does your starting dataset look like? Please read the following post to make your question reproducible and get better, faster help: [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – jpsmith Jun 28 '23 at 17:08
  • 1
    This existing question covers a number of possible methods to perform this general concept: [Numbering rows within groups in a data frame](https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame) The only difference for your case is you'll need to subtract 1 from your "counter" column. – Matt Summersgill Jun 28 '23 at 17:22

2 Answers2

2

It looks like your desired result is the row number for each surgeon minus one.

library(dplyr)

your_data |>
  mutate(event_before_index = row_number() - 1, .by = Surgeon)

This assumes your data is already sorted by date. If not, you can convert your date column to a proper Date class and sort it before running the above command.

library(lubridate)
your_data |>
  mutate(Operation_Date = dmy(Operation_Date)) |>
  arrange(Surgeon, Operation_Date) |>
  mutate(event_before_index = row_number() - 1, .by = Surgeon)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
-1

Please try the below code which can be give the Event before index (operation date) even if the patient is repeated operated by surgeon previously please check the new variable

however at some places it does not match i.e., highlighted with #

library(dplyr)

df_2 <- df %>% group_by(Surgeon,Patient) %>% 
  slice_tail(n=1) %>% group_by(Surgeon) %>% mutate(new=row_number()-1)

df_or <- df %>% 
  left_join(df_2 %>% select(Patient, Surgeon, new), by=c('Surgeon','Patient')) 


# output

# A tibble: 6 × 5
  Patient Surgeon Operation_Date Event_before   new
    <dbl> <chr>   <date>                <dbl> <dbl>
1       1 A       2023-06-28                0     0
2       2 A       2023-06-29                1     1
3       3 A       2023-06-30                2     2
4       4 B       2023-07-01                0     0
5       5 C       2023-07-02                1     0 #
6       6 C       2023-07-03                2     1 #

jkatam
  • 2,691
  • 1
  • 4
  • 12