0
med2['Medication'] = ["Terazosin Hydrochloride (Terazosin) oral capsule", "simvastatin oral tablet", "zithromax z-pak (azithromycin) oral tablet", "depo-medrol (methylprednisolone) injectable suspension", "zovirax topical (acyclovir topical) topical ointment", "nystatin oral suspension"]

med3 = pd.DataFrame(med2) # Made a dataframe to try to fix some error messages

med3 = med3['Medication'].str.lower()

I would like to do two things:

  1. select the medication name (for example above, "terazosin hydrochloride") and
  2. select the generic name between the parentheses(for the example above, "terazosin").

For #1, I made a list of "stop" words/characters (my real list is longer than the above example):

stop = ['(', 'oral', 'nasal', 'inhalation', 'topical', 'sublingual', 'opthalmic', 'otic', 'rectal', 'injectable', 'transdermal', 'vaginal', 'intramuscular', 'dose', 'suspension', 'subcutaneous']

med3['MedShort'] = med3['Medication'][:stop]

For #2, I made "Index" and "End":

Index =  med3['Medication'].find('(')
End = med3['Medication'].find(')')

med3['MedGeneric'] = med3['Medication'][Index:End]

But oh boy, is it not working. Do you have any recommendations? I would appreciate it!

Edited for consistent variable naming, with apologies.

And for clarification, Medication is not consistent. Most follow the pattern of "medication (generic) administration route" but a significant number follow the pattern of medication administration route". Thank you for your patience, apologies that I was not clear. \

Sandra T
  • 71
  • 6
  • Is it 'Medication' or 'MedicationName'? – deltab Jan 30 '21 at 22:15
  • I'm not sure to understand exactly what you expect. For example. Do you always want to stop before '(' or could it change. Have you got a more consistant dataset for being sure not missing a case? – Lumber Jack Jan 30 '21 at 22:16
  • I've edited for consistent "Medication" as the name. Thank you, Everless Drop 41, I'll try that to extract the generic name! Lumber Jack, it varies. Most records have the medication (generic) and then "oral tablet" or "topical cream", but some have no generic in parentheses, such as "Nystatin oral suspension". – Sandra T Jan 30 '21 at 22:44

1 Answers1

0

If all of your strings have the same structure, then

string = "Terazosin Hydrochloride (Terazosin) oral capsule"
drug_name, rest = string.split(" (")
generic = rest.split(")")[0]

print(drug_name.lower(), generic.lower())

will output terazosin hydrochloride and terazosin

ScienceSnake
  • 608
  • 4
  • 15
  • Thank you! Most follow the pattern above, but some are like "digoxin oral tablet", which is why I had that "stop" list. – Sandra T Jan 30 '21 at 22:30
  • I'm not sure I see what you mean. Can you give more examples in the original code that would cover all the cases you have in your data set? – ScienceSnake Jan 30 '21 at 22:35
  • I hope this helps make more sense: simvastatin oral tablet zithromax z-pak (azithromycin) oral tablet atenolol oral tablet nexium (esomeprazole) oral delayed release capsule nasonex (mometasone nasal) nasal spray depo-medrol (methylprednisolone) injectable suspension flector patch (diclofenac topical) topical film, extended release nuvaring (ethinyl estradiol-etonogestrel) vaginal ring metrogel-vaginal (metronidazole topical) vaginal gel with applicator zovirax topical (acyclovir topical) topical ointment – Sandra T Jan 30 '21 at 22:39