Get value between unknown string

Question

I'm trying to pull out a string between 2 other strings. But to make it more complicated the proceeding contents will often differ.

enter image description here

The string I'm trying to retrieve is Christchurch.

The regex I have so far is (?<=300px">).*(?=</td) and it will pull out the string I'm looking fine but it will also return dozens of other strings through out the LARGE text file I'm searching.

What I'd like to do is limit the prefix to start seraching from Office:, all the way to 300px"> but, the contents between those 2 strings will sometimes differ dependant upon user preferences.

To put it in crude non regex terms I want to do the following: Starting at Office: all the way to 300px> find the string that starts here and ends with </td. Thus resulting in Christchurch.

Greg the Incredulous · Answer 1 · 2018-04-13T03:33:27.693

Have you considered using the HTMLAgilityPack instead? It's a Nuget package for handling HTML which is able to handle malformed HTML pretty well. Most on Stack Overflow would recommend against using Regex for HTML - see here: RegEx match open tags except XHTML self-contained tags

Here's how you'd do it for your example:

using HtmlAgilityPack; //This is a nuget package!
var html = @"<tr >
               <td align=""right"" valign=""top""><strong>Office:</strong>&nbsp; </td>
               <td align=""left"" class=""stippel"" style=""white-space: wrap;max-width:300px"">Christchurch </td>
             </tr>";

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var node = htmlDoc.SelectSingleNode("//td[@class='stippel']");
Console.WriteLine(node.InnerHtml);

I haven't tested this code but it should do what you need.

The advantage to this is that you can probably look for the tag with your class in it and just pull out its value. — adamdc78, Mar 03 '15 at 00:36

adamdc78 · Answer 2 · 2015-03-03T01:13:50.847

0

The issue you're encountering is that * is greedy. Use the lazy/reluctant version *?.

Office:[\s\S]*?300px">(.*?)</td

This solution uses a group match rather than look-arounds.

edited Mar 03 '15 at 01:13

answered Mar 03 '15 at 00:31

adamdc78

1,153
8
18

That will not help. As mentioned in the post that regex '(?<=300px">).*(?= – Mike Stephens Mar 03 '15 at 00:38
Still not exactly what I'm looking for. The revised regex returns **everything** from Office to – Mike Stephens Mar 03 '15 at 01:20
I've included a link to a screenshot of the html code I'm working with. I could not attach a picture because my rep is too low. – Mike Stephens Mar 03 '15 at 01:34
It will match everything, but the first group will be the one you want. – adamdc78 Mar 03 '15 at 01:36

score 0 · Answer 3 · answered Mar 03 '15 at 00:50

0

I guess you need something like this:

office.*\n.*|(?<=300px">).*(?=<\/td)

answered Mar 03 '15 at 00:50

chouaib

2,763
5
20
35

Similar to the answer offered by adamdc78 this is not exactly what I want. I only want to retrieve the string **Christchurch**. – Mike Stephens Mar 03 '15 at 01:22
**Starting at Office: all the way to 300px> find the string that starts here and ends with – chouaib Mar 03 '15 at 01:23
Can you not see the source code in my initial post? – Mike Stephens Mar 03 '15 at 01:27
I can't see how this doesn't work with you! doesn't it retrieve only **Christchurch** ? I re-checked your initial post and I don't see where you stuck now – chouaib Mar 03 '15 at 01:31
I've included a link to a screenshot of the html code I'm working with. I could not attach a picture because my rep is too low. – Mike Stephens Mar 03 '15 at 01:34

score 0 · Accepted Answer · answered Mar 03 '15 at 01:39

0

Thanks to the posts from adamdc78 and greg I have the been able to come up with the below regex. This is exactly what I needed.

Thanks for you help.

(?<=office.*\n.*300px">).*(?=<\/td)

answered Mar 03 '15 at 01:39

Mike Stephens

33
4

1

welcome to StackOverflow: you should accept their answers (since they helped) and not add a *thank you answer* – chouaib Mar 03 '15 at 01:45

Get value between unknown string

4 Answers4