1

I have this regex string:

 ("prodcssstart.*prodcssend","xx")

and my file like this:

 prodcssstart
<!--<link href="content/bundles/css.min.css" rel="stylesheet" />-->
 prodcssend

But when I run it then it fails to find the expression.

Can someone suggest what I might be doing wrong, I've simplified it so much but I am thinking maybe there is a problem with the .* that I use to match everything. Any help would be much appreciated.

Alan2
  • 23,493
  • 79
  • 256
  • 450
  • 2
    Which regex system are you using? Are the parentheses meant to be grouping characters or literals? There's no `,"xx"` in your text, so it really isn't clear what you are trying to do (unless that is the search and replace strings in function call parentheses — but that just goes to affirm that you need to be a lot clearer about what you're up to). Also, does your regex system match across newlines by default. – Jonathan Leffler Mar 27 '16 at 06:23
  • 1
    In most regex conventions, dot does not match newline. – Gene Mar 27 '16 at 06:26
  • Also see https://meta.stackoverflow.com/questions/285733/should-give-me-a-regex-that-does-x-questions-be-closed – sashoalm Mar 27 '16 at 09:19

2 Answers2

2

First of all, you can't parse HTML with regex.

But in your case the problem is most likely caused by newlines, because by default the dot doesn't match the newline. You need to pass the proper switch to disable this (e.g. re.DOTALL in Python and s in Perl).

Community
  • 1
  • 1
wRAR
  • 25,009
  • 4
  • 84
  • 97
1

You could come up with a tempered greedy token solution:

prodcssstart
(?:(?!prodcssend).)*
prodcssend

Breakdown:

  • look for prodcssstart
  • match any character as long as it is not followed by prodcssend
  • match prodcssend

See a demo on regex101.com (mind the different modifiers!).

Jan
  • 42,290
  • 8
  • 54
  • 79