2

Let's say a code in HTML:

<a href="http://google.com">this is a search engine</a>"

How to look for "engine" and match anything until "this" gets reached?

I know I can do: this.*?engine - but this is from left to right matching, that is "ahead" matching, here I want to read backwards if this is possible at all?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Johny Leo
  • 59
  • 5
  • 1
    Use lookbehind. https://stackoverflow.com/questions/3839702/how-can-i-use-lookbehind-in-a-c-sharp-regex-in-order-to-skip-matches-of-repeated – Raymond Reddington Dec 12 '19 at 07:50

2 Answers2

0

You could reverse all strings and perform normal search:

string text = @"<a href=""http://google.com""> this is a search engine </a>";
string engine = "engine";
string strThis = "this";

new string(
  Regex.Match(
    new string(text.Reverse().ToArray()),
    new string(engine.Reverse().ToArray()) + ".+" + new string(strThis.Reverse().ToArray()))
 .Value
 .Reverse()
 .ToArray())

Also, to make code clearer, you could define extension method on a string, which reverses string and returns string instead of IEnumerable<char>. See this for reference.

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
  • cool, interesting solution. I usually always use regex (yes slower) but since yours works and i bet it's faster so I'll give it a shot. – Johny Leo Dec 12 '19 at 08:40
0

First, always parse HTML with a dedicated tool, see What is the best way to parse html in C#? for possible options.

Once the HTML is parsed you can get plain text to run your regex against.

You may still use your this.*?engine regex but enable RegexOptions.RightToLeft option, possibly coupled with RegexOptions.Singleline to match really any chars between the two words:

var result = Regex.Match(text, @"this.*?engine", RegexOptions.Singleline | RegexOptions.RightToLeft)?.Value;

See the online regex demo.

As per the documentation, RegexOptions.RightToLeft

Gets a value that indicates whether the regular expression searches from right to left.

C# demo:

var text = "blah blah this is a this search engine blah";
var result = Regex.Match(text, @"this.*?engine", 
        RegexOptions.Singleline | RegexOptions.RightToLeft)?.Value;
Console.WriteLine(result); // => this search engine
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563