-4

I have a stream reader that contains multiple IPs, somewhere within. I want to automatically extract all the IP's from the stream and ports.

Basically its a response from a get request, all ips and ports are represented like this:

<th>xx.xx.xx.xx</th>
<th>port</th>

I already have a regex expression to get the ip. Basically what i want to do is for each match found, to get the match, move "</th><th>".Length bytes forward, and then retrieve the port, and insert it as an IPAddress object to a list.

The problem is How can this be done when the regex needs to retrieve multiple results.

Bodokh
  • 976
  • 4
  • 15
  • 34

1 Answers1

2

With the risk of summoning all sorts of foul creatures (and I'm not mainly referring to SO users), here's a little unit test for you:

[TestMethod]
public void RegexTest()
{
    var input = "<th>192.168.1.1</th>\r<th>443</th>";

    var regex = @"(?s)<th>([0-9\.]*?)</th>.*?<th>([0-9]*?)</th>";
    var matches = Regex.Matches(input, regex);

    foreach (Match match in matches)
        Console.WriteLine("IP: {0}, port: {1}", match.Groups[1].Value, match.Groups[2].Value);
}

The problem is, which is one of the reasons you should generally avoid using regexes to parse HTML, that the exact formatting of the input becomes very important. For instance the above test breaks if you instead would have <th> 443</th> in the input.

Now go get your stake and your silver bullets, they're coming for us!!

Daniel Persson
  • 2,171
  • 1
  • 17
  • 24