0

I am using C# to parse XML, and this happens to me. This is not exactly what I am doing, but the same idea. Single line option is turned on.

So if I have a string:

Start xxx A xxx Pattern xxx End
Start xxx B xxx Pattern xxx End
Start xxx C xxx Pattern xxx End

if I want to extract "Start xxx B xxx Pattern xxx End", I use

(Start.*?B.*?Pattern.*?End)

But this thing really matches:

Start xxx A xxx Pattern xxx End
Start xxx B xxx Pattern xxx End

So I said, ok, maybe I should use:

.*(Start.*?B.*?Pattern.*?End)

This does extract what I want, but if I have

Start xxx A xxx Pattern xxx End
....
Start xxx B xxx Pattern xxx End
Start xxx C xxx Pattern xxx End

I will match everything below,

Start xxx A xxx Pattern xxx End
....
Start xxx B xxx Pattern xxx End

and then extract what I want. And I think this huge match makes my program slow for some reason.

So is there a way for me to specify that I want the shortest match of all, just matching

Start xxx B xxx Pattern xxx End

P.S. "xxx" in string does not quite have any pattern.

I believe it is some greedy, non-greedy issue, can anyone help?

  • 1
    What exactly are you trying to do? Because in general it's going to be faster and easier to use XmlDocument or LINQ to XML to parse XML rather than using regular expressions. For the greedy stuff, there's http://stackoverflow.com/questions/3898210/greedy-non-greedy-all-greedy-matching-in-c-sharp-regex?rq=1 – Heretic Monkey Sep 06 '16 at 22:11
  • Even though it is usually a bad idea to parse XML you regexes, here's a suggestion: try to replace the `.*?` with something more restrictive like `[^....]*?` to ensure that it will not match against `Start`, or whatever is the thing in the original XML does `Start` represent. – redneb Sep 06 '16 at 22:17
  • @redneb I would like to be able to get every single row, so maybe this would not allow me to match the first row. But Thanks! – Tianbaba Sep 06 '16 at 22:21
  • @MikeMcCaughan Thanks for the suggestions, I think I will go with them. XmlDocument seems pretty easy to interface, but is the document searchable? But I would still like to know the answer to this question, lol. Thanks! – Tianbaba Sep 06 '16 at 23:13
  • I don't see any XML in your post. Are you trying to parse the plain text? – Alexander Petrov Sep 06 '16 at 23:36

0 Answers0