0

For a project I need to preproccess data from a hospital and eventually make a predictive model.

In one of my preproccessing steps, I need to make a column that represents the number of cumulative days a patient was in the hospital. This number is determined by looking at several other columns in different rows. Also, a patient can be hospitalized multiple times on different occasions. I'm sorry if this is very confusing.

I've added a picture of a dataframe. I want to know how I can make an argument with R to make the column cdays out of the column Patientid and Date. sample of my data

I've tried numerous ways to do this. Some were using for and while loops with counters. And others were using a nested ifelse with new vectors (so I could compare current rows with rows of a iteration before):

#i-1 en i c.days
df$c.days <- 0
df$i_min_1c.days <- 0

#i en i+1 date
iDate<-df$Date[1:(nrow(df)-1)]
i_plus_1Date<-df$Date[2:(nrow(df))]

#i en i+1 patientid
iPatientid<-df$Patientid[1:(nrow(df)-1)]
i_plus_1Patientid<-df$Patientid[2:(nrow(df))]

newNew<-c(ifelse(iPatientid==i_plus_1Patientid, ifelse(i_plus_1Date-iDate>1,1,df$i_min_1c.days + 1), 1), df$c.days[nrow(df)])

Obviously this didn't work, but I was hopeless.... Could anyone point me in the right direction on how to proceed?

Some notes: - The complete dataframe is 800k rows long and it's 9 columns wide (keep in mind conversions will take a lot of time) - The value of cdays starts at 1 since it will be used as a multiplier, - If the date difference between the ith and i+1th is bigger than 1 day, it will be considered as a new session and the cdays value would be 1.

If you need any more information, feel free to ask. I will try my best! Thank you very much and I'm sorry for my bad English.

cmaher
  • 5,100
  • 1
  • 22
  • 34
  • 2
    'This number is determined by looking at several other columns' - please provide them. – anotherfred Apr 17 '18 at 15:17
  • 2
    I suggest you read a little bit about providing a good question: [SO q/a on reproducibliity](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), [SO's help/mcve](https://stackoverflow.com/help/mcve), and [SO's how-to-ask](https://stackoverflow.com/help/how-to-ask). Bottom line: if you cannot execute the code in this question in a fresh/empty R session, then neither can anyone else, often making it very difficult to provide relevant advice or answers. – r2evans Apr 17 '18 at 15:21
  • (Saying the same thing as r2e a different way) We do not need to see your actual data. Instead you should work on making a minimal reproducible example for us to look at that covers the issue you're facing: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 – Frank Apr 17 '18 at 15:21
  • *to make the column cdays out of the column Patientid and Date* ... but your sample screenshot has such a column, *cdays*. What is your objective? – Parfait Apr 17 '18 at 15:38
  • I want to know how to make such a column with R code. I ve made a sample with the result I am seeking. – Hussain Rahiminejad Apr 17 '18 at 16:02
  • @anotherfred in the sample the column "cdays" is the number I want as a result. It's just a counter that counts how many consecutive days a unique patient has been in a hospital with the condition that the difference between the dates arent longer than 1. – Hussain Rahiminejad Apr 17 '18 at 16:08
  • Hussain, please do not expect people to transcribe from an image of data; it is just as easy to paste the output of `dput(head(x,18))` as it is to take a screenshot and upload to SO/imgur. And because it makes it much easier for potential answerers to test with *your* data, it makes it much more likely that you can get a *prompt* and *relevant* answer. – r2evans Apr 17 '18 at 20:12

1 Answers1

0

Given that you have a large dataset, use data.table.

library(data.table)
setDT(df) #convert to data table
setorder(df, Patientid, Date) #am assuming your dates are R dates and not characters
df[, cdays := ifelse(
    Date == shift(Date) + 1,
    shift(cdays) + 1,
    1
),
by=Patientid]

The question is vague enough that I may easily have misunderstood it.

anotherfred
  • 1,330
  • 19
  • 25