0

Firstly, sorry for a complex question from a non-complex user. I'm a medical researcher trying to create a list of people who have had both a CT scan and X-ray within 3 months. I have two excel/csv documents:

(1) is all people who have had a CT scan and has four headings: Ref_CT, Date_CT, Patient ID_CT, Patient Name_CT;
(2) is all people who have had an X-ray and has four: XR Ref, Date, Patient ID, Patient ID.

In both of these excels, only the references are unique values, as most people have have 2+ studies, and multiple studies happen on the same date.

My goal is to print rows of: Patient ID; Patient Name; CT_Date; Xray_1_Date; Xray_2_Date;... Xray_10_Date (people have had up to 10 Xrays). Any advice on a technique to use, however general, would be fantastic.

Thank you so much

Dataframes (1) (Headings offset for ease of interpretation)

       Ref_CT    Date_CT     Patient ID_C               Patient Name_CT
0           1   9/8/2019          12345             PersonA, Felix (Mr)
1           2   9/6/2018          12346            PersonB, Anthony(Mr)
2           3   9/6/2017          12347             PersonC, Trev (Mrs)
3           4   9/6/2016          12348            PersonD, Jeremy (Mr)
4           5   9/8/2015          12348            PersonD, Jeremy (Mr)

(2)

          X-Ray Ref        Date     Patient ID                 Patient Name
0                 1   9/10/2019       12345             PersonA, Felix (Mr)
1                 2   9/10/2018       12345             PersonA, Felix (Mr)
2                 3   9/10/2017       12345             PersonA, Felix (Mr)
3                 4   9/10/2016       12347             PersonC, Trev (Mrs)
4                 5   9/10/2015       12348            PersonD, Jeremy (Mr)

Expected Output

         Patient_ID    CT_Date     XR1_Date     XR2_Date     XR3_date
0             12345   9/8/2019    9/10/2019     9/6/2018    9/16/2017
1             12346   9/6/2019    
2             12347   9/6/2017    9/10/2016
3             12348   9/6/2016    9/10/2015
4             12348   9/8/2015    9/10/2015

(Thank you @sammywemmy for providing advice on asking a better question)

zeborg
  • 1
  • 1
  • 2
    kindly share a sample dataframe, just a few rows and ur expected output. No pics, just data. Use this as a guide : https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – sammywemmy Feb 10 '20 at 06:13
  • no worries. noticed something... in your expected output, there is no row for patient id 12346, it seems you lumped that data with id 12345. any reason behind that? – sammywemmy Feb 10 '20 at 07:05
  • Looks like at least one patient changed gender between x-ray and ct scan? – jkpieterse Feb 10 '20 at 07:07
  • Hahaha, the gender switch-up happened as I made up datapoints and forgot to keep gender consistent. And I accidentally left out 12346 because once I can produce this data frame I'll use excel to delete patients that don't have both scans, and then filter for scans within 3 months of each other. That's all fairly easy to do, but this complex cross-referencing can't be done in excel, hence why my rapid plunge into python – zeborg Feb 10 '20 at 07:45

1 Answers1

0

For naming you data frames as df1 nad df2 use merge and groupby:

data = df1.merge(df2, left_on='Patient ID', right_on='Patient ID')
data = data.drop(['Ref_CT', 'X-Ray Ref', 'Patient Name_y'], axis=1)
data = data[['Patient ID', 'Patient Name_x', 'CT_Date', 'XR1_Date']]
data = data.groupby(['Patient ID', 'Patient Name_x'])[['CT_Date','XR1_Date']].agg(lambda x: list(set(x.values.tolist())))

Output:

You get compact data without multiple columns. Hope this is acceptable output.

                                              CT_Date                           XR1_Date
Patient ID Patient Name_x                                                               
12345      PersonA, Felix (Mr)             [9/8/2019]  [9/10/2017, 9/10/2019, 9/10/2018]
12347      PersonC, Trev (Mrs)             [9/6/2017]                        [9/10/2016]
12348      PersonD, Jeremy (Mr)  [9/6/2016, 9/8/2015]                        [9/10/2015]
Zaraki Kenpachi
  • 5,510
  • 2
  • 15
  • 38