I am having trouble figuring out how to extract specific text within a string. My dataset has been pulled from de-identified electronic health records, and contains a list of every medication that our patients have been prescribed. I am, however, only concerned with a specific list of medications, which I have in another table. Within each cell is the name of the medication, dose, and form (Tabs, Caps, etc.) [See image]. Much of this information is not important for my analysis though, and I only need to extract the medication names that match my list. It might also be useful to extract the first word from each string, as it is (in most cases) the name of the medication.
I have examined a number of different methods of pulling substrings, but haven't quite found something that meets my needs. Any help would be greatly appreciated.
Thanks.
Data DRUGS;
infile datalines flowover;
length drug1-drug69 $20;
array drug[69];
input (drug1-drug69)($);
datalines;
AMITRIPTYLINE
AMOXAPINE
BUPROPION
CITALOPRAM
CLOMIPRAMINE
DESIPRAMINE
DOXEPIN
ESCITALOPRAM
FLUOXETINE
FLUVOXAMINE
IMIPRAMINE
ISOCARBOXAZID
MAPROTILINE
MIRTAZAPINE
NEFAZODONE
NORTRIPTYLINE
PAROXETINE
PHENELZINE
PROTRIPTYLINE
SERTRALINE
TRANYLCYPROMINE
TRAZODONE
TRIMIPRAMINE
VENLAFAXINE
AMITRIP
ELEVIL
ENDEP
LEVATE
ADISEN
AMOLIFE
AMOXAN
AMOXAPINE
DEFANYL
OXAMINE
OXCAP
WELLBUTRIN
BUPROBAN
APLENZIN
BUDEPRION
ZYBAN
CELEXA
ANAFRANIL
NORPRAMIN
SILENOR
PRUDOXIN
ZONALON
LEXAPRO
PROZAC
SARAFEM
LUVOX
TOFRANIL
TOFRANIL-PM
MARPLAN
LUDIOMIL
REMERON
REMERONSOLTAB
PAMELOR
PAXIL
PEXEVA
BRISDELLE
NARDIL
VIVACTIL
ZOLOFT
PARNATE
OLEPTRO
SURMONTIL
EFFEXOR
DESVENLAFAXINE
PRISTIQ
;;;;
run;
Data DM4_;
if _n_=1 then set DRUGS;
array drug[69];
set DM4;
do _i = 1 to countw(Description,' ().,');
_med = scan(Description,_i,' ().,');
_whichmed = whichc(_med, of drug[*]);
if _whichmed > 0 then leave;
end;
run;
Data DM_Meds (drop = drug1-drug69 _i _med _whichmed);
Set DM4_;
IF _whichmed > 0 then anti = _med;
else anti = ' ';
run;