0

Wonderful people of Stack Overflow!

I am struggling with my R code. I have a whole bunch of people who have had many hospital visits - what I'm trying to get is the FIRST instance of each disease for every person. I have 6613 observations after removing duplicates, and 1306 unique id's within my dataset. So I know I need to have at least 1306 instances of first disease, and probably more seeing as some people have multiple co-morbidities.

I have arranged already by patient, and then by date. For example: What my dataset looks like

So for patient 0001, I want to get their FIRST case of angina, Chronic IHD and whatever other issues he might have (in reality some patients have 17 hospital visits, and most of them are rediagnosed.

I have tried a couple solutions found on StackOverflow, but I get ridiculous answers like 35 observations. This got me closest, using dplyr: data_new<-data %>% group_by(iid) %>% arrange(AdmiDate) %>% slice(1L) But I still don't have the number I would expect; like I said, I should get at least 1306.

Any help would be greatly appreciated!! Thank you so much in advance!

zx8754
  • 52,746
  • 12
  • 114
  • 209

2 Answers2

1

Without the data it is hard to know, but if I were to guess by looking at your picture, I'd think the following should work.

data %>% group_by(ID, Def) %>% filter(AmiDate == min(AmiDate))

Filter for the most recent data (min(AmiDate)) within the ID, Def groups.

fvall
  • 380
  • 2
  • 9
  • Wonderful, WONDERFUL person, I cannot thank you enough. One last question, if I may: when I do this, it chucks out any values where a date is missing. How would I keep these in? I've tried including na.rm=F and is.na into the answer you gave but it has a little meltdown. Any tips? – Confused_Unicorn Apr 21 '21 at 08:56
0

Here's a dplyr solution with mock data: first group_by the two grouping variables ID and def, then select the rows with the earlierst date per group using slice_min:

library(dplyr)
df %>%
  group_by(ID, def) %>%
  slice_min(admidate)
# A tibble: 4 x 3
# Groups:   ID, def [4]
  ID    def   admidate  
  <chr> <chr> <date>    
1 0001  A     2005-03-21
2 0001  B     2008-09-17
3 0002  A     2006-06-18
4 0002  X     2009-12-07

Data:

df <- data.frame(
  ID = c("0001", "0001", "0002", "0002", "0002"),
  def = c("A", "B", "X", "A", "X"),
  admidate = as.Date(c("21/03/2005", "17/09/2008", "07/12/2009", "18/06/2006", "22/11/2021"), format = "%d/%m/%Y")
)
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34