Regular expression for multiple lines in a web service response

Question

I would like to capture a part of the response I am receiving as part of webservice call using regex. Here is the response I am receiving and I am interested to capture the status of just ContactMessageTransport queue.

Status of the queue is being sent in the line:

<pogo:Status>Started</pogo:Status>

and queue name in the line:

<pogo:Name>ContactMessageTransport</pogo:Name>

I used <pogo:Name>ContactMessageTransport[\w\W]*Started<\/pogo:Status> and its getting the status of other queue as well which I am not interested. I am finding it hard to match just the few lines. Can you please help?

 <getAllMessageQueueInfoResponse xmlns="http:abcd.com/MessageQueueAnalyticsAPI">
     <return>
        <Entry xmlns:pogo="http://example.com/com/integration/services/messagequeueanalyticsservice">
           <pogo:AckCount>0</pogo:AckCount>
           <pogo:DestinationID>0</pogo:DestinationID>
           <pogo:ErrorCount>25</pogo:ErrorCount>
           <pogo:ID>67</pogo:ID>
           <pogo:Latest>2017-11-28T00:00:00-05:00</pogo:Latest>
           <pogo:Name>ContactMessageTransport</pogo:Name>
           <pogo:NotAckCount>0</pogo:NotAckCount>
           <pogo:Oldest>2017-11-28T00:00:00-05:00</pogo:Oldest>
           <pogo:RetryableErrorCount>31</pogo:RetryableErrorCount>
           <pogo:SkippedCount>0</pogo:SkippedCount>
           <pogo:Status>Started</pogo:Status>
           <pogo:UnsentCount>212</pogo:UnsentCount>
        </Entry>
        <Entry xmlns:pogo="http://example.com/com/integration/services/messagequeueanalyticsservice">
           <pogo:AckCount>0</pogo:AckCount>
           <pogo:DestinationID>0</pogo:DestinationID>
           <pogo:ErrorCount>0</pogo:ErrorCount>
           <pogo:ID>65</pogo:ID>
           <pogo:Latest>2018-03-17T00:00:00-04:00</pogo:Latest>
           <pogo:Name>Email</pogo:Name>
           <pogo:NotAckCount>0</pogo:NotAckCount>
           <pogo:Oldest>2018-03-17T00:00:00-04:00</pogo:Oldest>
           <pogo:RetryableErrorCount>4</pogo:RetryableErrorCount>
           <pogo:SkippedCount>0</pogo:SkippedCount>
           <pogo:Status>Started</pogo:Status>
           <pogo:UnsentCount>0</pogo:UnsentCount>
        </Entry>

You'll be better off using an XML parser then trying to regex this. That being said if you must use a regex, what other language are you using here, python? — sniperd, Jun 11 '18 at 15:01
I can help with the regex and could get a simple example in python going manipulating the text, but I don't know Java well enough. If that is helpful let me know and I'll write up an answer. — sniperd, Jun 11 '18 at 15:08

Nicolas · Accepted Answer · 2018-06-12T14:07:52.973

0

Is it possible that you are missing a simple ? lazy token? It can be used on a quantifier to only match the shortest possible sequence.

ContactMessageTransport[\w\W]*?Started<\/pogo:Status>

EDIT: Assuming there is always 8 lines to match:

ContactMessageTransport([^\r\n]*[\r\n]){8}

EDIT 2:

ContactMessageTransport[\s\S]*?Started(?:[^\r\n]*[\r\n]){3}

[\s\S]*? matches any character, up to Started.
(?:[^\r\n]*[\r\n]){3} matches 3 lines after Started. The added ?: is only used to prevent the parentheses from creating a group, which isn't needed. (It's called a "Non-capturing group")

edited Jun 12 '18 at 14:07

answered Jun 11 '18 at 15:18

Nicolas

6,611
3
29
73

Nice. That seems to have did the trick. I would like to enhance this a bit more. If I have to limit the regex to check the next two lines after the word Started? I am trying to see if I can match from the name that contains the queue line and till . This way I get to validate the response for one queue completely. – neomoto Jun 11 '18 at 15:44
Just replace the `<\/pogo:Status>` part with `<\/Entry>` – Nicolas Jun 11 '18 at 15:46
Thanks Nicolas. Sorry for not being clear. I would like to achieve the same result using an alternative way using multiline check like this one https://stackoverflow.com/questions/37687883/how-to-match-n-number-of-lines-with-regex using { and } characters. – neomoto Jun 11 '18 at 15:56
See my edit if it works. – Nicolas Jun 11 '18 at 16:00
Thanks again. This works as expected. For my understanding I am asking this question. \r matches the carriage return character and \n matches the linefeed or newline character, is ^ that is used for multi-line matching? I assume the number 8 is indicating the lines the regex to consider. Any tips from you would help my understanding. Thank you again. – neomoto Jun 11 '18 at 16:07
The `^` inside the brackets means it matches any character but the ones specified. So `[^\r\n]*` matches all characters until the line break, matched with `[\r\n]`. Because line breaks differ depending on the OS, this is a safe way to detect them. And yes the 8 is for 8 lines (8 line breaks actually). Don't forget to upvote! – Nicolas Jun 11 '18 at 16:19
Hello Nicolas, I have an additional requirement. I would also like to consider the word "Started" as part of my regex and consider only the 8 lines just like the expression ContactMessageTransport([^\r\n]*[\r\n]){8}. I just need to add the word Started here. Can you help here? – neomoto Jun 12 '18 at 12:34
See the second edit. – Nicolas Jun 12 '18 at 14:08
Awesome. Thank you very much Nicolas. The solution and comments were very helpful – neomoto Jun 12 '18 at 15:23
@neomoto Don't forget to upvote! – Nicolas Jun 12 '18 at 16:17

Regular expression for multiple lines in a web service response

1 Answers1