3

I am using Python to extract ICD9 codes. And am using the below regular expression

icdRegex = recomp('V\d{2}\.\d{1,2}|\d{3}\.\d{1,2}|E\d{3}\.\d')

It captures pattern similar to 137.98 or V35.62

Everything works fine except the expression also captures patient weights as ICD9 code.

Now what I observed is, the weight is almost always appears as ex: 110.67 kg or kgs or lb or lbs

How do I separate ICD9 from weight !?

HamZa
  • 14,671
  • 11
  • 54
  • 75
WeShall
  • 409
  • 7
  • 20

2 Answers2

1

Add a negative lookahead assertion like the follwing:

(V\d{2}\.\d{1,2}|\d{3}\.\d{1,2}|E\d{3}\.\d)\b(?!\s?(?:lb|kg)s?)
HamZa
  • 14,671
  • 11
  • 54
  • 75
chapelo
  • 2,519
  • 13
  • 19
1

Here is HamZa's expression for everyone:

icdRegex = recomp("\b(?:V\d{2}\.\d{1,2}|\d{3}\.\d{1,2}|E\d{3}\.\d)\b(?!\s*(?:kg|lb)s?\b)")

Thanks HamZa & Chapelo for helping out. Appreciate it.

HamZa
  • 14,671
  • 11
  • 54
  • 75
WeShall
  • 409
  • 7
  • 20
  • Guys... for the REGEX we wrote almost 2 month's back, failed on one condition. For numbers starting with 0 it is capturing wrong pattern. For example: if ICD9 code is 032.9, the expression will return it as 329. Any fix for codes starting with 0 !? – WeShall Jan 27 '15 at 21:29
  • Further refinement to this thread [here](http://stackoverflow.com/questions/28200337/how-to-make-regex-ignore-a-pattern-following-a-specific-group) – WeShall Jan 29 '15 at 17:15