1

I am using Struts 1.3.1 validator (validation.xml, using mask and regular expression), which doesn't support Unicode regular expressions. (I read and I tried)

When it's a whitelist, and unicode, it should be something like this:

^[\p{L}\p{P}\p{Zs}]+

Yet, in my case, I need to filter all "helping characters" and leave out letters only.

Does anyone have an idea of a blacklist regular expression to answer my needs?

I though of this one, but it obviously doesn't cover everything:

^[^&^>^/^<^\\^*^\?^%^:]$

Thanks a lot!

Community
  • 1
  • 1
Northern Pole
  • 169
  • 2
  • 12
  • What are "helping characters"? Are you basically saying you want to match unicode letters? – MDEV Sep 11 '13 at 12:15
  • Yes. I am trying to match all unicode letters, without using the unicode stuff that isn't supported in struts. – Northern Pole Sep 11 '13 at 12:18
  • The character class you wrote is actually `[^aglmpt/:;\\*?%^&]`, because character class treat a 'string' as a set of char. Use negative lookahead instead, like `(?!^&|^>|^/|^<|^\\|^*|^\?|^%|^:)` – davide Sep 11 '13 at 12:40
  • Thanks for your comment. I didn't understand the equality your presented davide, I also edited my original expression a bit as I had some typo there). – Northern Pole Sep 11 '13 at 13:58
  • You generally shouldn't be doing string manipulations on the HTML-encoded form of a value—validate it raw first, do the HTML-escaping later. Also you have `\p{P}` in your target whitelist and that includes all the characters you're trying to exclude, so I'm a bit confused. – bobince Sep 11 '13 at 14:31
  • bobince, I cannot use \p{P}, as struts validator doesn't support that (at least in the version I need to use). So I am trying to filter out bad inputs, that's all. I din't do any string manipulation. Am I missing something? – Northern Pole Sep 11 '13 at 15:19
  • Well, you appear to be trying to exclude `&`, but you won't have HTML-escaped input unless you have specifically escaped it yourself for some reason (there are reasons people do sometimes HTML-escape all input, but they are very bad reasons). As for `\p{P}` it would *allow* all the characters `&>/<\*?%`—is that really what you want? – bobince Sep 11 '13 at 20:11

1 Answers1

2

This is the solution I chose in the end:

Note! this is Struts 1.3.1 syntax for a validator mask!

^[^&amp;&gt;&lt;\\*?%:!&quot;#$()+,;=@\[\]{}~\^|`\n\t\r/]+$

Disallowing special characters, allowing others.

This was considered as a white-list approach, but was neglected as it needed work (adding other languages other than European and Japanese/Chinese):

^[a-zA-Z0-9\-'àÀâÂäÄáÁéÉèÈêÊëËìÌîÎïÏòóÒôÔöÖùúÙûÛüÜçÇ’ñß]+|[一-龠]+|[ぁ-ゔ]+|[ァ-ヴー]+|[々〆〤]+$
Northern Pole
  • 169
  • 2
  • 12