2

I did post this question 2 months back and got the following REGEX pattern to capture ICD9 codes. What is expected is to capture only ICD9 codes (ex: 134.57 or V23.54 or E33.62) and ignore patient's weight 134.57 lb or a lab result like 127.20 mg/dL.

icdRegex = recomp('(V\d{2}\.\d{1,2}|\d{3}\.\d{1,2}|E\d{3}\.\d)(?!\s*(?:kg|lb|mg)s?)')

Now exceptions have arised. The second part of regex does not ignore the pattern that is followed by either kg, lb, mg or any other stop words.

I can write some basic Regex but this is getting a little too complicated for my tiny brain and need help.

Community
  • 1
  • 1
WeShall
  • 409
  • 7
  • 20
  • 1
    Can you post example input data and what you want to capture? What language? – dawg Jan 28 '15 at 19:09
  • It's Python. Sample data would look something like //Type 2 diabetes mellitus (250.00) (E11.9)Hypertension (401.9) (I10) Hyperlipidemia (272.4) (E78.5) Osteopenia (733.90) (M85.80) Vitamin D deficiency (268.9) (E55.9) Weight 272.4 lb Testestore 250.0// – WeShall Jan 28 '15 at 19:15
  • Would each record have the `//` delimiter? Is the ICD9 code always in parenthesis? The more specific you are, the more robust your solution. – dawg Jan 28 '15 at 21:56
  • No the delimiters are only to mark the boundary of sample data in the post. Yes ICD9 codes will always be in the parenthesis. – WeShall Jan 28 '15 at 23:49

1 Answers1

2
(?:(?:V\d{1,2}\.\d{1,2})|(?:\d{1,3}\.\d{1,2})|(?:E\d{1,2}\.\d+))(?!\d|(?:\s*(?:kg|lb|mg)s?))

Try this.See demo.

https://regex101.com/r/pM9yO9/12

modified the lookahead to include \d in it so that partial matches are avoided

x="""134.57 or V23.54 or E33.62 134.57 lb or a lab result like 127.20 mg/dL"""
print re.findall(r"(?:(?:V\d{1,2}\.\d{1,2})|(?:\d{3}\.\d{1,2})|(?:E\d{1,2}\.\d+))(?!\d|(?:\s*(?:kg|lb|mg)s?))",x)

Output:['134.57', 'V23.54', 'E33.62']

Final version tested against the data.

icdRegex = recomp("(?:(?:V\d{1,2}.\d{1,2})|(?:\d{3}.\d{1,2})|(?:E\d{1,2}.\d+))(?!\d|(?:\s*(?:kg|lb|mg)s?))") codes = findall(icdRegex, hit)

where "hit" will be the clinical note.

WeShall
  • 409
  • 7
  • 20
vks
  • 67,027
  • 10
  • 91
  • 124
  • What did you change and why? – Bergi Jan 28 '15 at 19:12
  • I think I did not do a good job in asking the question. The first part of my Regex capturing ICD9 code is working perfectly fine, well tested. The problem is it is also capturing any weight like 134.56 lb as code because it matched the pattern. @vks : This expression captures a lot more then just the ICD9 pattern (ex: 2.5, 44.52 which is not what we want). I need help to fix the second part and make it ignore the values that are followed by the stop words. Hope I am making sense. – WeShall Jan 28 '15 at 19:26
  • It still leaves out first 134.57. An ICD9 code can have the same value as someone's weight. We bet on the regular expression to identify that the first one 134.57 is a code and second 134.57 is weight as it is followed by lb. – WeShall Jan 28 '15 at 19:53
  • @vks: it work's in the regex tester but not in my code. Is Regex for python different ? – WeShall Jan 28 '15 at 23:50
  • @vks : No what... U are right. Your regex pattern work perfectly fine. The reason it was not working for me was I was splitting up the entire clinical document as seperate lines. When I did change my flow to run the regex on the entire document, it worked like charm. Thanx a ton. – WeShall Jan 29 '15 at 16:53