1

I have been looking for a few hours how to do this particular regular expression magic with little to no luck.

I have been playing around with parsing some of my own medical data (why not?) which unfortunately comes in the form of a very unstructured text document with no tags (XML or HTML).

Specifically, as a prototype, I only want to match what my LDL delta (cholesterol change) is as a percentage.

In the form it shows up in a few different ways:

LDL change since last visit: 10%

or

LDL change since last visit:
10%

or

LDL change since last visit:

10%

I have been trying to do this in JavaScript using the native RegExp engine for a few hours (more than I want to admit) with little success. I am by no means a RegExp expert but I have been looking at an expression like such:

(?<=LDL change since last visit)*(0*(100\.00|[0-9]?[0-9]\.[0-9]{0,2})%) 

Which I know does not work in JS because the lack support for ?<=. I tested these in Ruby but even then they were not successful. Could anybody work me through some ways of doing this?

EDIT:

Since this particular metric shows up a few times in different areas, I would like the regex to match them all and have them be accessible in multiple groups. Say matching group 0 corresponds to the Lipid Profile section and matching group 1 corresponds to the Summary.

Lipid profile
...
LDL change since last visit:

10%
...

Summary of Important Metrics
...
LDL change since last visit: 10%
...
Alexander Ventura
  • 1,150
  • 1
  • 10
  • 32

1 Answers1

3

A lookbehind solution is complicated because most languages only support fixed or finite length lookbehind assertions. Therefore it's easier to use a capturing group instead. (Also, the * quantifier after the lookbehind that you used makes no sense).

And since you don't really need to validate the number (right?), I would simply do

regexp = /LDL change since last visit:\s*([\d.]+)%/
match = regexp.match(subject)
if match
    match = match[1]
else
    match = nil
end

If you expect multiple matches per string, use .scan():

subject.scan(/LDL change since last visit:\s*([\d.]+)%/)
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Is there a way to change that regexp into getting every instance in the document of "LDL change"? Some of these show them in multiple places for different things. So I want to group them depending on the heading. – Alexander Ventura Jan 30 '14 at 08:00