7

I want to make pre-validation of some input form with new HTML5 pattern attirbute. My dataset is "Domain Name", so <input type="url"> regex preset isn't applied.

But there is a problem, I wont use A-Za-z , because of damned IDN's (Internationalized domain name).

So question: is there any way to use <input pattern=""> for random non-english letters validation ?

I tried \w ofcource but it works only for latin...

Maybe someone has a set of some \xNN-\xNN which guarantees entering of ALL unicode alpha characters, or some another way?

edit: "This question may already have an answer here:" - no, there is no answer.

J_z
  • 993
  • 1
  • 9
  • 18
  • possible duplicate of [Regular expression to match non-english characters?](http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters) – Jukka K. Korpela Feb 08 '13 at 12:58
  • Since HTML5 `pattern` attribute uses JavaScript regexps, this is effectively a duplicate of a question about matching letters in JavaScript. – Jukka K. Korpela Feb 08 '13 at 12:59
  • I have no problem with understanding and experience regex in general, so that topics are useless. And still have not found a reasonable solution for my case. Seems it's fail of w3c spec makers. – J_z Feb 09 '13 at 01:58
  • The old question referred to has good practical answers to the question asked. HTML5 intentionally uses JavaScript regexp syntax and semantics in the `pattern` attribute. – Jukka K. Korpela Feb 09 '13 at 07:43

2 Answers2

3

Based on my testing, HTML5 pattern attributes supports Unicode character code points in the exact same way that JavaScript does and does not:

  • It only supports \u notation for unicode code points so \u00a1 will match '¡'.
  • Because these define characters, you can use them in character ranges like [\u00a1-\uffff]
  • . will match Unicode characters as well.

You don't really specify how you want to pre-validate so I can't really help you more than that, but by looking up the unicode character values, you should be able to work out what you need in your regex.

Keep in mind that the pattern regex execution is rather dumb overall and isn't universally supported. I recommend progressive enhancement with some javascript on top of the pattern value (you can even re-use the regex more or less).

As always, never trust user input - It doesn't take a genius to make a request to your form endpoint and pass more or less whatever data they like. Your server-side validation should necessarily be more explicit. Your client-side validation can be more generous, depending upon whether false positives or false negatives are more problematic to your use case.

skovacs1
  • 461
  • 1
  • 6
  • 14
0

I know this isn't what you want to hear, but...

The HTML5 pattern attribute isn't really for the programmer so much as it's for the user. So, considering the unfortunate limitations of pattern, you are best off providing a "loose" pattern--one that doesn't give false negatives but allows for a few false positives. When I've run into this problem, I found that the best thing to do was a pattern consisting of a blacklist + a couple minimum requirements. Hopefully, that can be done in your case.

David
  • 1,175
  • 1
  • 16
  • 29