1

When my button1 is clicked it runs this

 MatchCollection matchCollection = new Regex(@"(?<=/&gt;)\d+").Matches(new StreamReader(((HttpWebResponse)((HttpWebRequest)WebRequest.Create("http://www.proxyserverlist24.top/feeds/posts/default")).GetResponse()).GetResponseStream()).ReadToEnd());

Basicly, it's going to http://www.proxyserverlist24.top/feeds/posts/default and trying to extract the numbers between /%gt; and lt;br

/%gt;103.12.161.1:65103%lt;br /%gt;103.16.61.134:8080%lt;br /%gt;103.21.77.106:8080%lt;br

How do I go about to grabbing those numbers?

Denis
  • 31
  • 1
  • 7
  • 5
    Don't use RegEx to parse HTML. – Sani Huttunen Jan 27 '18 at 16:01
  • 2
    [Parsing HTML with RegEx](https://stackoverflow.com/q/1732348/1070452) You cant have done much research because that is one of the most upvoted posts on the site – Ňɏssa Pøngjǣrdenlarp Jan 27 '18 at 16:02
  • My code worked before, on a different site. Since that site went down I have to change the regex values – Denis Jan 27 '18 at 16:04
  • 1
    Try https://regex101.com/r/jsHTaq/1 – Srdjan M. Jan 27 '18 at 17:26
  • 1
    Noone is trying to parse HTML in this question. I wonder if anyone even bothers looking up the meaning of *parse*, or at least read the post you are quoting. Regex is great for searching text and retrieving or replacing based on patterns. Regex is a great solution to the asked question. – melwil Jan 27 '18 at 17:46
  • @Denis: If all you want is each number, why not just use (\d+) instead of worrying about "%gt," or "%lt,br" ? – Mark Benningfield Jan 27 '18 at 18:39
  • @MarkBenningfield OP is trying to extract proxy addresses in the form `x.x.x.x:xxxx` – EZI Jan 27 '18 at 20:25
  • @EZI: I understand that, but the regex pattern shown captures them one numeric element at a time. It's unclear if they want each element, or each address in a capture group. – Mark Benningfield Jan 27 '18 at 20:30
  • @S.Kablar that value doesn't work the site that I linked has much more numbers that are not between %gt and %lt,br. Once I click a button it will extract all the proxies from the site and display them in the listbox – Denis Jan 27 '18 at 20:38

1 Answers1

1

No need to Regex. You can use xml parser(your link returns xml), and an html parser (HtmlAgilityPack) to parse text of "content" tag. So final code is:

IPAddress tempip;
int port;
List<IPEndPoint> proxies = null;

using (var client = new HttpClient())
{
    var doc = new HtmlAgilityPack.HtmlDocument();
    XNamespace ns = "http://www.w3.org/2005/Atom";
    var xml = await client.GetStringAsync("http://www.proxyserverlist24.top/feeds/posts/default");
    var xDoc = XDocument.Parse(xml);
    proxies = xDoc.Descendants(ns + "entry")
        .Select(x => (string)x.Element(ns + "content"))
        .SelectMany(x =>
        {
            doc.LoadHtml(x);
            return doc.DocumentNode.SelectNodes("//span[not(span)]")
                        .SelectMany(n => n.Descendants())
                        .Select(n => n.InnerText.Split(":".ToCharArray(), StringSplitOptions.RemoveEmptyEntries))
                        .Where(n => n.Length == 2)
                        .Where(n => IPAddress.TryParse(n[0], out tempip))
                        .Where(n => int.TryParse(n[1], out port))
                        .Select(n => new IPEndPoint(IPAddress.Parse(n[0]), int.Parse(n[1])));
        })
        .ToList();
}

In fact a shorter regex solution is also possible, but is it not a good idea to use regex to parse xml or html as mentioned in comments.

EZI
  • 15,209
  • 2
  • 27
  • 33
  • I have solved it in an easier way by using Regex in this way : new Regex("(?<=>)([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}:[0-9]{1,5})(?=<br)" – Denis Jan 27 '18 at 21:59