I'm learning Python and thought working on a project might be the best way to learn it. I have about 200,000 rows of data in which the data shows list of medication for the patient. Here's a sample of the data.
PTID PTNAME MME DRNAME DRUGNAME SPLY STR QTY FACTOR
1 PATIENT, A 2700 DR, A OXYCODONE HCL 15 MG 30 15 120 1.5
1 PATIENT, A 2700 DR, B OXYCODONE HCL 15 MG 30 15 120 1.5
2 PATIENT, B 4050 DR, C MORPHINE SULFATE ER 15 MG 30 15 270 1
2 PATIENT, B 4050 DR, C MORPHINE SULFATE ER 15 MG 30 15 270 1
2 PATIENT, B 840 DR, A OXYCODONE-ACETAMINOPHE 10MG-32 14 10 56 1.5
2 PATIENT, B 1350 DR, C OXYCODONE-ACETAMINOPHE 5 MG-32 15 5 180 1.5
3 PATIENT, C 1350 DR, C OXYCODONE-ACETAMINOPHE 5 MG-32 15 5 180 1.5
3 PATIENT, C 1800 DR, D OXYCODONE-ACETAMINOPHE 10MG-32 30 10 120 1.5
I've been thinking about this a lot and have tried many ways but none of the code produce any results or makes any sense. Honestly, I don't even know where to begin. A little help would be highly appreciated.
So, what I want to do is consolidate the data for each patients and calculate the Total MME
for each patient. The DRUGNAME
should show the one that has higher MME. In other words, the dataframe should only have one row for each patient.
One thing I did try is
groupby_ptname = semp.groupby('PTNAME').apply(lambda x: x.MME.sum())
which shows unique patient names with total MME, but I'm not sure how to add other variables in this new dataframe.