2

I want to block all character that has possible script such as #$%^&*<>~\[]{}@.,?|/

I cannot use ^[a-zA-Z]([\w -]*[a-zA-Z])?$/i.test(value) because at my application I have spanish lang support which includes alphabets like ę Æ and so on....

Now how can i achieve this forming a Regex? Can anyone help me here? New to RegEx

I want to block special character specified above. characters which can potential form a script. For restriction of user input purpose

Sarun UK
  • 6,210
  • 7
  • 23
  • 48
Zeus Carl
  • 139
  • 4
  • 12
  • 1
    Sorry, what is the problem? Do you want to match a string that is fully composed of non-special chars? – Wiktor Stribiżew Nov 20 '20 at 14:33
  • 1
    “Block” them for what purpose? And what do single characters have to do with “script”? – CBroe Nov 20 '20 at 14:33
  • 1
    I want to block special character specified above. characters which can potential form a script. For restriction of user input purpose. @CBroe – Zeus Carl Nov 20 '20 at 14:35
  • 3
    Yes, but what is the _context_? While this might make sense for specific value such as maybe a user name, it makes much little sense if we are talking just about any free-form text input here. You say you are worried about characters from the Spanish language, but then you want to block simple punctuation characters such as dot or coma already - so the input of an actual natural language text with multiple sentences would be impossible in English already. So, what is the _context_? – CBroe Nov 20 '20 at 14:38
  • 1
    And why do you think anything that _could_ form a “script”, needed blocking in the first place? If Stackoverflow did that, we could hardly have any discussion about code here at all. But those _don’t_ get blocked here, and yet this site is not constantly in danger of hacking … So, also in that regard, _what_ is the context? – CBroe Nov 20 '20 at 14:39
  • 1
    @CBroe I have such use case, that's why posted the question. Stuck with the solution – Zeus Carl Nov 20 '20 at 14:41
  • 2
    So just use a negated character class containing all those “bad” ones then? – CBroe Nov 20 '20 at 14:42
  • 1
    are you aware that react already does this for you? are you using `dangerouslySetInnerHTML`? again,we can't answer your question without any context! – r3wt Nov 20 '20 at 14:43
  • 1
    I guess you just want to make your second regex Unicode aware. Try `/^\p{L}(?:[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}\s-]*\p{L})?$/iu.test(value)` – Wiktor Stribiżew Nov 20 '20 at 14:50

1 Answers1

1

/^[a-zA-Z]([\w -]*[a-zA-Z])?$/i regex only matches ASCII characters.

If you plan to make it work with Spanish language, you need to make it Unicode aware.

Bearing in mind that a Unicode aware \w can be represented with [\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}] (see What's the correct regex range for javascript's regexes to match all the non word characters in any script?) and the Unicode letter pattern is \p{L}, the direct Unicode equivalent of your pattern is

/^\p{L}(?:[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}\s-]*\p{L})?$/iu.test(value)

I also replaced the regular space with \s to match any kind of Unicode whitespace.

Details

  • ^ - start of string
  • \p{L} - any Unicode letter
  • (?:[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}\s-]*\p{L})? - an optional occurrence of any 0 or more Unicode word chars (letter, diacritic, number, connector punctuation (like _), join control chars), whitespace or hyphens followed with a single Unicode letter
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • What if I don't want to put space after the input value. Meaning if I put `example` works but if I put `example ` then it shows error. I want to allow space after `example ` like this. Other than that works great! – Zeus Carl Nov 20 '20 at 16:34
  • @ZeusCarl Do you mean you have [this](https://regex101.com/r/6wb8Ri/1) where `example` has trailing spaces and you want to allow them? Add `\s*` at the end. See [this regex demo](https://regex101.com/r/6wb8Ri/2) where I am using a regular space instead of `\s` since the demo is run against a single multiline text. – Wiktor Stribiżew Nov 20 '20 at 16:38
  • 1
    Thanks, This is what I wanted. – Zeus Carl Nov 20 '20 at 16:41
  • What if I want to allow numbers?? like `example3` is causing error but I want to allow such value inputs. – Zeus Carl Nov 20 '20 at 17:04
  • 1
    @ZeusCarl Then change `\p{L}` into `[\p{L}\p{N}]`, see [this regex demo](https://regex101.com/r/6wb8Ri/3). If you have a list of specific requirements, please share. Fixing example after example is too time consuming. – Wiktor Stribiżew Nov 20 '20 at 17:05
  • Like Only thing is After `example- ` when things are allowed and then when it comes at the end it shows error. So anything which is allowed and last value like `example-` or `example_` it should not generate error. – Zeus Carl Nov 20 '20 at 17:11
  • 1
    So, all allowed chars are allowed everywhere and in any succession? Just use `/^(?:[^\p{P}\p{S}]|[_-])*$/u` (see [demo](https://regex101.com/r/6wb8Ri/6)). Or, just use `/^\p{L}[\p{L}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation} -]*$/u`, see [this demo](https://regex101.com/r/6wb8Ri/7) (where the first char must be a letter, and the rest is any zero or more of allowed chars). – Wiktor Stribiżew Nov 20 '20 at 17:16