I have a data frame like as shown below
df = pd.DataFrame({'person_id': [11,11,11,11,11,11,11,11,11,11],
'text':['inJECTable 1234 Eprex DOSE 4000 units on NONd',
'department 6789 DOSE 8000 units on DIALYSIS days - IV Interm',
'inJECTable 4321 Eprex DOSE - 3 times/wk on NONdialysis day',
'insulin MixTARD 30/70 - inJECTable 46 units',
'insulin ISOPHANE -- InsulaTARD Vial - inJECTable 56 units SC SubCutaneous',
'1-alfacalcidol DOSE 1 mcg - 3 times a week - IV Intermittent',
'jevity liquid - FEEDS PO Jevity - 237 mL - 1 times per day',
'1-alfacalcidol DOSE 1 mcg - 3 times per week - IV Intermittent',
'1-supported DOSE 1 mcg - 1 time/day - IV Intermittent',
'1-testpackage DOSE 1 mcg - 1 time a day - IV Intermittent']})
I would like to remove the words/strings which follow patterns such as 46 units
, 3 times a week
, 3 times per week
, 1 time/day
etc.
I was reading about positive and negative look ahead and behind.
So, was trying something like below
[^([0-9\s]*(?=units))] #to remove terms like `46 units` from the string
[^[0-9\s]*(?=times)(times a day)] # don't know how to make this work for all time variants
time variants ex: 3 times a day
, 3 time/wk
, 3 times per day
, 3 times a month
, 3 times/month
etc.
Basically, I expect my output to be something like below (remove terms like xx units, xx time a day, xx times per week, xx time/day, xx time/wk, xx time/week, xx times per week, etc)