2

I have a dataframe with many thousands of rows. Every row is a hospitalization record; it contains the ID of the patient and a lot of health information (diagnosis, date of admission, date of dismissal, and so on).

Every patient can have more than a hospitalization record, but I need only the first hospitalization of every patient, e.g. the first record for each patient ID according to the date of admission. How can I get this result in R?

Thank you in advance.

  • 5
    You need to post sample data to make your example [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). Maybe `library(dplyr) ; df %>% group_by(patientID) %>% filter(admissionDate == min(admissionDate)` – alistaire May 06 '16 at 19:09
  • 1
    `library(data.table); setDT(data); data[order(admission_date), .SD[1], by = patient_id]` – MichaelChirico May 06 '16 at 19:15

1 Answers1

1

I think I have a solution, but there's probably a smoother way to do this.

Try this using dplyr. Note, I assume that when you say 'first' record you mean oldest record. If you want the most recent record, use max() instead.

install.packages('dplyr')
library(dplyr)

your_data <- group_by(your_data, patientID)
## This gives you a data frame with all dates and IDs for first visits
first_records <- summarise(your_data, min(admit_date))

## Create ID to match 
first_records$matchID <- paste(first_records$patientID, first_records$admit_date)
your_data$matchID <- paste(your_data$patientID, your_data$admit_date)

## Get complete records
first_records <- your_data[your_data$matchID %in% first_records$matchID, ]

Lemme know how this goes.

EDIT: Definitely looks like an easier solution that @alistaire posted:

your_data <- group_by(your_data, patientID)
first_records <- filter(your_data, adm_date == min(admission_date))
Raphael K
  • 2,265
  • 1
  • 16
  • 23