2

I am trying to use python and nltk to parse some doctor notes that describe a medication prescription. I'm looking for a method to identify a numerical value for # items taken and how often the items are taken.

1 TABLET DAILY
TAKE 1 TABLET DAILY
ONE TABLET TWICE DAILY
2 DAILY
TWO TABLETS DAILY
ONE PILL AT BEDTIME
1/2 PILL TWICE DAILY
ROLLING WALKER WITH SEAT ATTACHMENT AND HAND BRAKES
ONE PILL DAILY
1 TAB PO DAILY
ONE PILL TWICE A DAY WITH MEALS AS NEEDED
1 TABLET TWICE DAILY
300 MG BID
ONE DAILY
1 TABLET 3 TIMES DAILY AS NEEDED
1 DAILY
TAKE 1 CAPSULE BY MOUTH 4 (FOUR) TIMES A DAY.
1 TABLET EVERY 4 TO 6 HOURS AS NEEDED
1 TABLET BY MOUTH TWICE DAILY
INJECT 34 U TWICE A DAY

Any advice?

Selah
  • 7,728
  • 9
  • 48
  • 60
  • 1
    This might help you along the right path: http://stackoverflow.com/questions/33337410/nltk-reading-in-word-numbers-to-float-numbers – tatlar May 08 '17 at 20:45
  • 1
    You could also look at this project, I couldn't get the Earley parser python code to run but the authors seem to have been working on the same problem. http://www.mit.edu/~6.863/spring2009/projects/project16.html – griffinc May 09 '17 at 22:52

1 Answers1

1

Typically there are multiple variations in which these are written by doctors in clinical notes. For eg:

1 TABLET DAILY 

could also be written as

1 tab qid

If you are looking for a quick fix writing a python script with regular expression might help. But if you want something more long term, you could take a look at data and submissions for the i2b2 Medication Information Extraction Challenge