0

I already have some regex logic which says to look for a div tag with class=something. However, this might occur more than once (one after another). You can't simply add square brackets around that complex regex logic already (e.g. [:some complicated regex logic already existing:]* -- so how do you do it in regex? I want to avoid having to use the programming language logic to append that regex logic after itself if I can...

Thanks

tom
  • 620
  • 1
  • 7
  • 16
  • 4
    Uh, you're not trying to parse HTML with a regex, are you...? – Ignacio Vazquez-Abrams Jun 05 '10 at 07:49
  • 2
    @Ignacio, I am afraid this is exactly what the OP is trying to do. Here's an interesting post explaining why this is bad: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Darin Dimitrov Jun 05 '10 at 07:52
  • What language are you using? You might get more specific answers if you specify the language. – Mark Byers Jun 05 '10 at 08:16
  • I tried xpath/xquery, but the webpage's HTML is not structurally sound. I tried converting it to XML with tools like Tidy, but there were errors in the HTML. I'm only trying to parse 3 pieces of information from the HTML. – tom Jun 05 '10 at 19:43

1 Answers1

1

Don't parse HTML with regexen! Seriously, it's literally impossible in the general case.

To answer your regex question: if you have some arbitrarily complex regex R, you can do the following things with it:

  • (R) matches R and stores it in a capturing group.
  • (?:R), if supported by your regex engine, matches R without storing it in a capturing group.

In other words, parentheses group; square brackets, on the other hand, are for character classes only. You probably want something like (with a better regex for your div) (?:<div class="something">\s*)+: match the div followed by any number of spaces, and find that one or more times. But please reconsider using regexen for this—while they're a handy tool for many things, HTML is not one of them.

Community
  • 1
  • 1
Antal Spector-Zabusky
  • 36,191
  • 7
  • 77
  • 140