0

Working on a regex pattern to sanitize HTML output and remove any special characters. My thought is to write a regex listing all the characters I want to keep and remove everything else rather then trying to account for all special characters in the pattern.

My current pattern:

/[^0-9A-Za-z,=": ?'`&;>|<!.\-\/]/

It's working great, except it is removing parenthesis () which I'd like to keep. I can't seem to escape them correctly when adding to my pattern. What is the best way to do this?

ericalli
  • 1,203
  • 15
  • 25
  • Related question: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Andrew Grimm Jun 16 '11 at 23:40

3 Answers3

4
/[^0-9A-Za-z,=": ?'`&;>|<!.\-\/()]/

Inside range blocks "[]", different escape rules apply.

vstrien
  • 2,547
  • 3
  • 27
  • 47
  • Although escaping the `(`- and `)`'s: `/[^0-9A-Za-z,=": ?'\`&;>|<!.\-\/\(\)]/`, will also work (although not needed, as you mentioned). Makes me wonder _how_ the OP was escaping them... – Bart Kiers Jun 16 '11 at 13:03
2

The best way is to use the sanitize method built in to Rails.

Mark Thomas
  • 37,131
  • 11
  • 74
  • 101
  • Upvoting this. It's a well tested method. Trying to build your own regex will work in many cases, but not in all. For example, someone could put in: pt>. Your removal of the middle script tag would result in another script tag being rendered. – agmcleod Jun 16 '11 at 13:02
0
str.delete( %q{^a-zA-Z1-9,=:"`&;>|<!.-/ ()'} )
# or with another delimiter (*):
str.delete( %q*^a-zA-Z1-9,=:"`&;>|<!.-/ ()'* )

String.delete takes one or more strings as argument (and negates them with '^', just like a regex). With the %q{string} syntax you don have to worry about escaping.

steenslag
  • 79,051
  • 16
  • 138
  • 171
  • You still have to worry about escaping right? Just not escaping forward slashes. If he wanted to have { } in his expression he'd need to escape it correct? – yarian Jun 16 '11 at 15:22
  • @YGomez Yes. But you can choose the delimiter yourself. Updating answer. – steenslag Jun 16 '11 at 15:33