2

I'm attempting to use regex to grab a URL from some HTML but it isn't working.

<h3 class="(.*?)"><a onmousedown="(.*?)" href="(.*?)">(.*?)</a></h3>

Could someone help me with this and explain it, I'm not the best with Regex and I would like to actually see where I went wrong..

EDIT:

I'm not "grabbing" code from anywhere. I have known about regex for a long time but I've never done much with it, I figured It could come in handy for this project so I gave it a shot. Here is my code:

  static void Main(string[] args)
    {
        WebClient wc = new WebClient();
        String html = wc.DownloadString("http://www.example.com/");
        foreach (String result in match("<h3 class=\"(.*)\"><a onmousedown=\"(.*)\" href=\"(.*)\">(.*)</a></h3>", html))
        {
            Console.WriteLine("result: " + result);
        }
        Console.ReadKey();
    }


    public static ArrayList match(string regex, string html, int x = 0)
    {
        ArrayList l = new ArrayList();
        foreach (Match m in new Regex(regex, RegexOptions.Multiline).Matches(html))
        {
            l.Add(m.Groups[x].Value);
        }
        return l;
    }
user1728017
  • 67
  • 1
  • 6
  • why the heck are you guys downvoting this? +1 to counteract trolls. – System.Cats.Lol May 23 '13 at 01:08
  • Where is the HTML coming from? Do you know it's exactly going to be like `

    some crap here

    `?
    – cHao May 23 '13 at 01:09
  • @System.Cats.Lol: It's probably being downvoted because there are far, *far* more reliable ways to handle the general case than with regular expressions. Particularly when you're trying to do it with a regex that doesn't even attempt to ignore HTML tag and quote characters. – cHao May 23 '13 at 01:10
  • Show us the code where you are using this and what you are testing with this regex – lahsrah May 23 '13 at 01:12
  • @cHao, That doesn't make it a bad question. – Brad May 23 '13 at 01:13
  • 2
    @System.Cats.Lol - please edit the question with good explanation of "but it isn't working." since you decided to vote it up as good question. So far it looks like "I grabbed some random piece of code - figure out if it does what I want and fix if it is not". – Alexei Levenkov May 23 '13 at 01:14
  • @Brad: It does make it "not useful", though. If someone answers the question asked, they're helping people paint themselves into a corner. – cHao May 23 '13 at 01:17
  • 1
    @cHao, No, it doesn't. A useful answer to this question is one that shows the proper way to get at those attribute values. The question, despite its wording, is not about RegEx. It's about parsing HTML with C#. If there is a reason to downvote, Alexei hit the nail on the head. – Brad May 23 '13 at 01:18
  • @Brad: The *correct* answer would indeed be "don't do that" and a demo of how to use an HTML parser. A *useful* answer, on the other hand, would actually answer the question asked, which very plainly *is* about regex. Unfortunately, the way the question is worded, it seems to me an answer can't be both useful and correct. – cHao May 23 '13 at 01:47
  • I'm not "grabbing" code from anywhere, I took my best shot at trying to use regex to parse some links on a page. I don't know what you don't get about "It isn't working", when I compile and execute my application it doesn't find any regex matches. Do you understand that? – user1728017 May 23 '13 at 01:58
  • 2
    @user1728017: We do understand, and we're trying to help you. As noted in other answers to that question that Brad and I already linked to, HTML is Chomsky Type 2, and regexes are only kitted out to parse Chomsky Type 3 grammars ([more info on the Chomsky hierarchy](https://en.wikipedia.org/wiki/Chomsky_hierarchy#The_hierarchy)). Just use an HTML parser - it's probably not as hard as you think, and will save you a lot of trouble down the road. – michaelb958--GoFundMonica May 23 '13 at 02:07
  • @michaelb958 Alright. – user1728017 May 23 '13 at 02:13
  • @michaelb958: Modern regular expressions aren't your grandaddy's regexes (or the ones they use in school). They can actually handle some non-regular grammars (usually with a lot of effort, but eh). But you're right in that an HTML parser would be considerably less trouble. Here's some code that builds a [regex for XML](https://snipt.net/xanatos/regex-to-tokenize-an-xml/)...just because someone decided to be a freak. :) – cHao May 23 '13 at 02:15
  • 1
    @cHao I'm aware of the *incredible* power available in regexes these days. However, in my experience an HTML parser looks so much less like line noise (meaning easier debuggability when something breaks, which is practically inevitable here). – michaelb958--GoFundMonica May 23 '13 at 02:20

0 Answers0