1

I am not really good at regular expressions, but I want to use only one regular expression to match for both:

  • <span>
  • </span>

Any suggestion?

Wesley Murch
  • 101,186
  • 37
  • 194
  • 228
Derek 朕會功夫
  • 92,235
  • 44
  • 185
  • 247
  • 7
    @Mimi If I could, I would give you a negative mark. The OP never said anything about parsing HTML, he said he wanted a regex capable of matching two strings. Don't see anything wrong with that. `?span>` will do the job perfectly. – NullUserException Aug 26 '11 at 01:06
  • 1
    Question is not specific enough. Ok, you matched those... now what? Are you looking for those tags, or what is between them? The regex itself is so trivial it seems that what you are really saying is that you don't want to take the time to learn even the most basic aspects of using them? Also DOM parsers or xml parsers could easily be a much better solution as suggested by Mimi and Slaks. – gview Aug 26 '11 at 01:11

1 Answers1

9

</?span>

However, you shouldn't parse HTML using regular expressions.

Community
  • 1
  • 1
SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • 1
    Downvote for the annoying joke reference. Next time just answer the question, please. – tchrist Aug 26 '11 at 01:10
  • Why the downvote? His regex works fine: http://regexpal.com/?flags=&regex=%3C%2F%3Fspan%3E&input=%3Cspan%3E%0A%3C%2Fspan%3E – Paul Aug 26 '11 at 01:10
  • @tchrist Oh, you already explained the downvote. He has a point with the link though, and it is a good idea to warn the OP not to parse HTML with regex. He did say it was a tag after all, so he could be trying – Paul Aug 26 '11 at 01:11
  • I wish that link was not the de facto post about this subject (referring to bobince's post). While possibly amusing - it is not helpful and there are answers there with hundreds of upvotes that *are* in fact using regex to parse HTML. `"Regexes worked just fine for me, and were very fast to set up."` - 462 votes – Wesley Murch Aug 26 '11 at 01:15
  • 3
    @PaulPRO: There is nothing wrong with applying regexes to HTML, and it is offensive to shove that stoopid joke posting at people without actually helping them. When you are in your editor *editing an HTML file* and you type `250-300s!
    !!g` to remove the breaks from lines 250 through 300, you have just used a regex on HTML. So what?
    – tchrist Aug 26 '11 at 01:15
  • @tchrist: That's not _parsing_. Regexes are fine for _manipulating_ HTML (up to a certain point, anyway), but if you want to parse a compelx hierarchy, they're the wrong choice. – SLaks Aug 26 '11 at 01:42
  • @tchrist I said parse, not edit. There is something wrong with parsing HTML with regex. There is a lot wrong with it, it is literally 100% impossible. HTML is not a regular language. – Paul Aug 26 '11 at 01:53
  • 1
    @tchrist sLaks did not shove that link at anyone without helping them. He helped them, gave them a regular expressions which solves their problem, and then warned them against the dangers of trying to parse HTML with regex just in case that is what they are trying to do. – Paul Aug 26 '11 at 01:54
  • 1
    Perhaps we should find a more helpful link for how to parse HTML, rather than laugh at people for suggesting it, when they may not know better, by directing them to a joke post where the author with 4000 upvotes is clearly frustrated by the question and offers nothing particularly useful. I've been here about a year and the meme got old very quickly. – Wesley Murch Aug 26 '11 at 02:06
  • @Wesley: That depends on the language. Had the OP specified the language, I would have mentioned HAP/BS/JS/whatever. – SLaks Aug 26 '11 at 02:15
  • It's tagged javascript in this case. I guess this is one of those cases where "use jQuery" may be a good answer, but of course it's not clear what OP really wants to do. – Wesley Murch Aug 26 '11 at 02:18
  • 1
    @PaulPro That's pretty pedantic. HTML isn't a regular language, but no modern regex implementation is truly regular. It is *not* impossible to parse HTML with regex. – NullUserException Aug 26 '11 at 07:17
  • @NullUser It is entirely impossible with a true regex,and the only language which has a regex engine powerful enough to parse HTML is perl, and even then I doubt anyone has ever done it, or is ever going to successfully. – Paul Aug 26 '11 at 07:25
  • Oh, you are talking to him. He claims (and you better believe him) to be *"perfectly capable and willing to write regexes that are dynamically self-modifying recursive-descent parsers"*. See [this](http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491) and [this](http://stackoverflow.com/questions/4284176/doubt-in-parsing-data-in-perl-where-am-i-going-wrong/4286326#4286326) – NullUserException Aug 26 '11 at 07:26
  • @NullUser Hmm interesting links, but if you look at his code, he clearly uses native perl structures like loops and such. perl is (obviously) Turing Complete, so of course you can parse HTML in it. He's using regex as a significant portion of his code, but he has many regexes, not one, and he is not "parsing HTML with regex alone" which is what people mean when they say that regex isn't meant to do it. I'm guessing all HTML parsers use regex in places, but it is impossible to parse HTML with a single regular expression, which is what most of this kind of question are about. – Paul Aug 26 '11 at 07:41
  • @PaulPro Do not tell me what is impossible; if you give me a BNF grammar, I can always give you a single pattern to parse strings that conform to that BNF. As for using loops, yes I did that because it was what the problem at hand called for; so what? The evil eye — and evil finger — people here are slapped with just for mentioning HTML and pattern matching in the same posting is wholly FITH. None of these folks wants a full parse tree; they just want to diddle some HTML, and regexes are just fine for that. Wesley has the right of it here: the dumb joke posting will get my downvote every time. – tchrist Aug 26 '11 at 12:53