0

I'm using the following RegEx to strip HTML tags from a string

<[^>]*>

This does not get rid of blank lines, however. I found this other RegEx that successfully removes any blank lines:

[#Chr(13)##Chr(10)#]+

I tried to combine both as such:

ReReplaceNoCase(arguments.string, "(<[^>]*>)([#Chr(13)##Chr(10)#]+)", "", "ALL")

But this does not work. I'm using ColdFusion to do this, which should explain the # signs.

I thought the () were used to group operators in RegEx, but it does not seem to work in my attempt to combine the two expressions.

Mohamad
  • 34,731
  • 32
  • 140
  • 219
  • Why do you need to combine them into a single regular expressions? Can't you do it in two steps? – Mark Byers Dec 27 '10 at 16:18
  • 2
    mandatory read: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Mauricio Scheffer Dec 27 '10 at 16:19
  • @Mark, I can, but isn't it more efficient to run the string through one function rather than two? If not, why? – Mohamad Dec 27 '10 at 16:22
  • 3
    Have you measured the performance of the two-step solution and found it to be the bottleneck in your application? If not, you might be prematurely optimizing. – Mark Byers Dec 27 '10 at 16:28

2 Answers2

1

Assuming that the two regular expressions you have work as you want then you can combine them using an alternation:

<[^>]*>|[#Chr(13)##Chr(10)#]+

I strongly suspect though that the regular expressions you have posted don't in fact work correctly. I'd advise you not to use regular expressions to parse HTML as HTML is not a regular language. Use an HTML parser instead.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Mark, all I want to do is get rid of HTML in a string in order to use it in the "title" attribute of a span or an anchor; I should not use regular expressions for this? – Mohamad Dec 27 '10 at 16:23
  • 1
    @Mel: Your question is like saying "All I want to do is *remove* some screws - not screw them in. And they're only *small* screws so I don't think I need a screwdriver. Won't a hammer work in this case?" - Well, it *might* work *some* of the time, but it's still not the easiest way to do it. – Mark Byers Dec 27 '10 at 16:25
  • I got it. I'll try and find a function in ColdFusion that does this. I'm just so used to seeing people use RegEx for this purpose I thought it was the right thing to do. – Mohamad Dec 27 '10 at 16:36
1
stripcr(ReReplaceNoCase(arguments.string, "(<[^>]*>)", "", "ALL"))
KobbyPemson
  • 2,519
  • 1
  • 18
  • 33