1

I know the regex for excluding words, roughly anyway, It would be (!?wordToIgnore|wordToIgnore2|wordToIgnore3)

But I have an existing, complicated regex that I need to add this to, and I am a bit confused about how to go about that. I'm still pretty new to regex, and it took me a very long time to make this particular one, but I'm not sure where to insert it or how ...

The regex I have is ...

^(?!.*[ ]{2})(?!.*[']{2})(?!.*[-]{2})(?:[a-zA-Z0-9 \:/\p{L}'-]{1,64}$)$

This should only allow the person typing to insert between 1 and 64 letters that match that pattern, cannot start with a space, quote, double quote, special character, a dash, an escape character, etc, and only allows a-z both upper and lowercase, can include a space, ":", a dash, and a quote anywhere but the beginning.

But I want to forbid them from using certain words, so I have this list of words that I want to be forbidden, I just cannot figure out how to get that to fit into here.. I tried just pasting the whole .. "block" in, and that didn't work.

?!the|and|or|a|given|some|that|this|then|than

Has anyone encountered this before?

Ciel
  • 4,290
  • 8
  • 51
  • 110
  • Does it have to be a regex? – gog May 26 '14 at 17:43
  • It has to work with jQuery Validation, and ASP.NET MVC server side validation, so regex is preferable. If I cannot make it work with regex, I will go to more brute force methods of actual string manipulation/parsing, but I would prefer to not go that route if I can avoid it. – Ciel May 26 '14 at 17:46
  • 1
    When asking regex questions, please be sure to add language tags (Regex behavior depends heavily on the regex engine used.) e.g. The regex above: has a character class which includes: `\p{L}` UNICODE letter, but this syntax is not recognized by JavaScript - it's regex engine sees this as: \, `p`, `{`, `L` and `}`. – ridgerunner May 26 '14 at 18:48
  • Hey, thanks! I didn't know it was different in different languages. – Ciel May 26 '14 at 18:52

2 Answers2

3

ciel, first off, congratulations for getting this far trying to build your regex rule. If you want to read something detailed about all kinds of exclusions, I suggest you have a look at Match (or replace) a pattern except in situations s1, s2, s3 etc

Next, in your particular situation, here is how we could approach your regex.

  1. For consision, let's make all the negative lookarounds more compact, replacing them with a single (?!.*(?: |-|'){2})
  2. In your character class, the \: just escapes the colon, needlessly so as : is enough. I assume you wanted to add a backslash character, and if so we need to use \\
  3. \p{L} includes [a-zA-Z], so you can drop [a-zA-Z]. But are you sure you want to match all letters in any script? (Thai etc). If so, remember to set the u flag after the regex string.
  4. For your "bad word exclusion" applying to the whole string, place it at the same position as the other lookarounds, i.e., at the head of the string, but using the .* as in your other exclusions: (?!.*(?:wordToIgnore|wordToIgnore2|wordToIgnore3)) It does not matter which lookahead comes first because lookarounds do not change your position in the string. For more on this, see Mastering Lookahead and Lookbehind

This gives us this glorious regex (I added the case-insensitive flag):

^(?i)(?!.*(?:wordToIgnore|wordToIgnore2|wordToIgnore3))(?!.*(?: |-|'){2})(?:[\\0-9 :/\p{L}'-]{1,64}$)$ 

Of course if you don't want unicode letters, replace \p{L} with a-z

Also, if you want to make sure that the wordToIgnore is a real word, as opposed to an embedded string (for instance you don't want cat but you are okay with catalog), add boundaries to the lookahead rule: (?!.*\b(?:wordToIgnore|wordToIgnore2|wordToIgnore3)\b)

Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105
  • Hey, this is an extremely helpful post, thank you so much. I've spent a very long time on the regex I presently have, and it wasn't easy - It's an extremely large "language" to learn! Your sample is exactly right, that's absolutely what I am trying to do. I will look over it in further detail as soon as I can get near a computer. – Ciel May 27 '14 at 02:24
  • @Ciel It was a treat to read you friendly and positive feedback, glad it works. :) In my view, for someone who's starting out with regex you're doing extremely well, the expression you had built had the right ideas. Once a few more pieces click you'll get up to speed in no time. :) If you find the time you may find the questions referenced in the answer useful. Pls feel free to revisit if some questions come up. – zx81 May 27 '14 at 06:09
0

use this:

^(?!.*(the|and|or|a|given|some|that|this|then|than))(?!.*[ ]{2})(?!.*[']{2})(?!.*[-]{2})(?:[a-zA-Z0-9 \:\p{L}'-]{1,64}$)$

see demo

Farvardin
  • 5,336
  • 5
  • 33
  • 54
  • Hrnm.. that doesn't quite seem to work. Putting in a normal name with two words doesn't pass the test. – Ciel May 26 '14 at 19:11
  • Here is an example of how it works right now, before I've added the word exclusions... I basically want the last two lines to be invalid input. http://regex101.com/r/vU0iQ4 – Ciel May 26 '14 at 19:18
  • That still doesn't accomplish it - the lines should fail because they contain words on the blacklist, not because they have 3 words. With that approach, they are only failing because they have 3 words instead of 2. – Ciel May 26 '14 at 20:08