0

I am working on a healthcare dataset from Kaggle (https://www.kaggle.com/joniarroba/noshowappointments), which contains information about medical appointments in Brazil and whether or not the patient showed up. The dataset has columns for appointment ID, patient ID, appointment date and time, scheduled date and time, and several other features.

I want to add a column to the DataFrame that shows the no-show rate for each appointment based on the patient's previous appointments. For example, if a patient has had three appointments and showed up for two of them, the no-show rate for their fourth appointment would be 1/3. If the patient is a first-time appointment, the no-show rate would be 0.

    # convert appointment and scheduled dates to datetime format
    df['AppointmentDay'] = pd.to_datetime(df['AppointmentDay'])
    df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'])

    # create a new column with the time difference between scheduled and 
    appointment date
    df['time_diff'] = (df['AppointmentDay'].dt.date - 
    df['ScheduledDay'].dt.date).dt.days

   # group by PatientId and calculate no-show count and appointment count 
   for each group
   grouped = df.groupby('PatientId')['No-show'].apply(lambda x: 
   x.eq('Yes').cumsum().shift().fillna(0))
   df['no_show_count'] = grouped
   df['appointment_count'] = grouped + df.groupby('PatientId').cumcount()

   # calculate no-show rate for each patient
   df['no_show_rate'] = df['no_show_count'] / df['appointment_count']

   # replace NaN values in 'no_show_rate' column with 0
   df['no_show_rate'] = df['no_show_rate'].fillna(0)

    # print first 5 rows
    print(df.head())

    the problem is in this code is it calculate current appointment. for 
    example  if you 
     df[df['PatientId'] == 
     112397157856688.0].sort_values('AppointmentDay') ,
    you will understand 
    better what i mean
  • 1
    What have you tried, and what went wrong with your attempts? Your question should include a [mcve] with sample input (in the text of the question, not as an external link), expected output, and _code_ showing your own efforts, so that we can offer specific answers. See [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – G. Anderson Mar 01 '23 at 17:08
  • A simple approach is to: 1) add a column that converts the No-show categorical column of Yes/No to 1/0; 2) then groupby PatientId and use transform of mean of this 1/0 column to get no-show rate. – DarrylG Mar 01 '23 at 17:38
  • @G.Anderson i insert the code .could you plz help me? – pooya ensafi Mar 02 '23 at 10:31

0 Answers0