0

I am trying to extract the text 3.81 pH from the below output. For that, initially wrote a regex (\d+\.\d\d)\s(pH).

<ESC>CS3<CR><LF>Date/Time 27-Oct-2022 <CR><LF> 16:01:59 <CR><LF>Sample -- <CR><LF>Status OK <CR><LF> <CR><LF>User Administrato<CR><LF> r <CR><LF> <CR><LF>Method DM <CR><LF> <CR><LF>Meas.type1 DM pH <CR><LF> Sensor MTPHSensor <CR><LF> <CR><LF>Value 3.81 pH <CR><LF> Temp. 23.6 <194><176>C ATC <CR><LF> Endpoint Automatic <CR><LF> (Standard) <CR><LF> <CR><LF> <CR><LF> <CR><LF> <CR><LF> <CR><LF>

However, when I ran the compiler engine (which I am having to use) to test it:

  1. For a few moments, it showed blank in the UI

  2. Then it blinked the value 3.81 ph on the UI (much to my joy) for a few moments

  3. Then agonizingly showed blank again and eventually stopped returning me a blank

From this observation, I hypothesized that the compiler engine must have continued to search for this pattern in the remainder of the text even after it found the first match and that is why it eventually returned with a blank.

I subsequently understood from a few similar threads in Stackoverflow (linked below) that I am probably correct (but not 100% sure and happy to be rectified).

  1. Regular expression to stop at first match

  2. How to multiline regex but stop after first match?

Moreover from these threads, I understood that I need to make the regex lazy (or non-greedy) to ensure the compiler engine stops at the first occurrence it finds the match and returns the matched value (which is what I want it to do).

For that, I have modified the regex now to (\d+?.\d+?)\s(pH).

This expression also matches the text 3.81 pH. I checked it on the Regex101 website. But are the placement of the ? character really correct for what I am trying to do?

Thanks in advance!

Arnab Roy
  • 307
  • 3
  • 8
  • 2
    If you want the first value, you can use an anchor to assert the start of the string and use a lazy dot to get to the first occurrence. Then capture the value in a capture group `^.*?\b(\d+\.\d\d\spH)\b` What is the tool or language that you are using? – The fourth bird Nov 05 '22 at 13:19
  • 1
    Perhaps there is a global flag that you can disable as well to get the first value. – The fourth bird Nov 05 '22 at 13:20
  • Thanks @Thefourthbird for your response. I will try the expression next Monday when I am back at work. Unfortunately, I do not know what language the compiler engine is written in. I know it sounds weird. But the compiler engine I mentioned is like a black box to us. There is only a UI to insert the regex and test output strings like the above. However, based on my understanding of the parent application of which this engine is a part it could be written in C#. Do you think this particular information can be of help? If so, I can try to obtain that from the vendor. – Arnab Roy Nov 05 '22 at 13:34

1 Answers1

2

Yes, the expression you would need to use would be (\d+?.\d+?)\s(pH) But that's probably not your problem

\d+ will match one or more digit, but it will try to match as many digits as possible. the "?" symbol will ensure the regex will instead try to match as few digits as it can to complete a match.

However after \d+ encounters something that is not a digit (for example the next to your result) no matter if its a greedy modifier or not, it will stop the match and continue. So if your expression keeps looking ahead is not because of this but because of the regex configuration.

Regex have two parts: the expression (that defines what to match) and the flags (that define how to match it). Usually regex are written like this

/expression/flags

with the flags after the last / of the regex If its not your case, then there must be a way somewhere else in your engine to configure them. If you find it you will want to remove the "g" flag that represents "global" which means that the expression will not stop after the first match

You can see it in one of the examples of the questions you linked https://regex101.com/r/ahVkw1/3 Here you can see the flags at the end. enter image description here

Remove the "g" and you will see that only the first ocurrence is matched enter image description here

Daniel Cruz
  • 1,437
  • 3
  • 5
  • 19