1

I need regex to validate Firstname and Lastname fields.

People can have 2 names for example so it should be able to handle multiple non latin words.

([^\x00-\x7F]|\w)+

This is what I have for validatating latin+non-latin characters, but it doesn't support multiple words.

If I enter: JĀNIS BĀNIS for example, it doesnt work!

  • Most regexp libraries support a "unicode" flag that changes the meaning of `\w`, `\b`, etc. In JS, though, the [relevant character class `\p{L}` support is partial](https://javascript.info/regexp-unicode). – 9000 Dec 11 '19 at 17:15
  • It's good that you're considering names with "non-Latin" characters, be careful that there are a lot of other incorrect assumptions you may be making about people's names: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ – Jordan Running Dec 11 '19 at 17:25
  • Have you tried [`[\u0000-\u017F]+`](https://regex101.com/r/zSqwwJ/1) for one or more in the wanted range? This will match [all Basic Latin until Latin Extended-A](https://unicode.org/charts/). Not only word characters. For only letters and digits, try something like this class [`[\da-zA-Z\u00C0-\u017F]+`](https://regex101.com/r/zSqwwJ/2) – bobble bubble Dec 11 '19 at 17:44

1 Answers1

1

If query the UCD database for LATIN specific properties that are letters and numbers
using this regex ( \w ) for Latin:

[\p{Block=Basic_Latin}\p{Block=Latin_1_Supplement}\p{Block=Latin_Extended_A}\p{Block=Latin_Extended_Additional}\p{Block=Latin_Extended_B}\p{Block=Latin_Extended_C}\p{Block=Latin_Extended_D}\p{Block=Latin_Extended_E}\p{Script=Latin}\p{Script_Extensions=Latin}](?<=\w)

yields this JavaScript usable class :

[0-9A-Z_a-z\u00AA\u00B5\u00BA\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02B8\u02E0-\u02E4\u0363-\u036F\u0485-\u0486\u0951-\u0952\u1D00-\u1D25\u1D2C-\u1D5C\u1D62-\u1D65\u1D6B-\u1D77\u1D79-\u1DBE\u1E00-\u1EFF\u2071\u207F\u2090-\u209C\u20F0\u212A-\u212B\u2132\u214E\u2183-\u2184\u2C60-\u2C7F\uA722-\uA788\uA78B-\uA7BF\uA7C2-\uA7C6\uA7F7-\uA7FF\uAB30-\uAB5A\uAB5C-\uAB67\uFB00-\uFB06\uFF21-\uFF3A\uFF41-\uFF5A]

___________________

Doing the same for punctuation ( \p{P} ) for Latin :

[\p{Block=Basic_Latin}\p{Block=Latin_1_Supplement}\p{Block=Latin_Extended_A}\p{Block=Latin_Extended_Additional}\p{Block=Latin_Extended_B}\p{Block=Latin_Extended_C}\p{Block=Latin_Extended_D}\p{Block=Latin_Extended_E}\p{Script=Latin}\p{Script_Extensions=Latin}](?<=\p{P})

yields this JavaScript usable class :

[!-#%-*,-/:-;?-@[-]_{}\u00A1\u00A7\u00AB\u00B6-\u00B7\u00BB\u00BF\u10FB\uA92E]

______________

Both can be combined with the white space construct \s to get a reasonable
name validation regex.

/^(?:[\s!-#%-*,-\/:-;?-@[-]_{}\u00A1\u00A7\u00AB\u00B6-\u00B7\u00BB\u00BF\u10FB\uA92E]*[0-9A-Z_a-z\u00AA\u00B5\u00BA\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02B8\u02E0-\u02E4\u0363-\u036F\u0485-\u0486\u0951-\u0952\u1D00-\u1D25\u1D2C-\u1D5C\u1D62-\u1D65\u1D6B-\u1D77\u1D79-\u1DBE\u1E00-\u1EFF\u2071\u207F\u2090-\u209C\u20F0\u212A-\u212B\u2132\u214E\u2183-\u2184\u2C60-\u2C7F\uA722-\uA788\uA78B-\uA7BF\uA7C2-\uA7C6\uA7F7-\uA7FF\uAB30-\uAB5A\uAB5C-\uAB67\uFB00-\uFB06\uFF21-\uFF3A\uFF41-\uFF5A]+)+[\s!-#%-*,-\/:-;?-@[-]_{}\u00A1\u00A7\u00AB\u00B6-\u00B7\u00BB\u00BF\u10FB\uA92E]*$/

Expanded

^
(?:
   [\s!-#%-*,-/:-;?-@[-]_{}\u00A1\u00A7\u00AB\u00B6-\u00B7\u00BB\u00BF\u10FB\uA92E]*  
   [0-9A-Z_a-z\u00AA\u00B5\u00BA\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02B8\u02E0-\u02E4\u0363-\u036F\u0485-\u0486\u0951-\u0952\u1D00-\u1D25\u1D2C-\u1D5C\u1D62-\u1D65\u1D6B-\u1D77\u1D79-\u1DBE\u1E00-\u1EFF\u2071\u207F\u2090-\u209C\u20F0\u212A-\u212B\u2132\u214E\u2183-\u2184\u2C60-\u2C7F\uA722-\uA788\uA78B-\uA7BF\uA7C2-\uA7C6\uA7F7-\uA7FF\uAB30-\uAB5A\uAB5C-\uAB67\uFB00-\uFB06\uFF21-\uFF3A\uFF41-\uFF5A]+
)+
[\s!-#%-*,-/:-;?-@[-]_{}\u00A1\u00A7\u00AB\u00B6-\u00B7\u00BB\u00BF\u10FB\uA92E]*  
$