0

Sorry for being an absolute beginner, when it comes to Javascript and Regex,

I have a Codepen: http://codepen.io/anon/pen/dNJNvK

What I want to accomplish is to validate, if a String contains some special UTF-8 characters. That's why I work with RegExp. The pattern I have here will return false only if the string to test equals one of the characters. But I want to return false if it contains one of these characters.

How can I accomplish this, I know it should be quite easy, but I wasn't able to get it working.

var regEx = new RegExp('[\u0001-\u00FF]');

console.log("This should be true: " + regEx.test("Tes"));
console.log("This should be false: " + regEx.test("Tes�"));

console.log("This returns false, because the string equals a special character: " + regEx.test("�"));
Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
user5417542
  • 3,146
  • 6
  • 29
  • 50

2 Answers2

2

Why not the other way around?

See Regular expression to match non-English characters?

Also your test could be match instead or a test of the WHOLE string

var regEx = /[\x00-\x7F]/g; // can be added to
function okChar(str) {
  var res = str.match(regEx);
  if (res===null) return false;
  return res.length===str.length;
}
console.log("This should be true: " + okChar("Tes"))
console.log("This should be false: " + okChar("Tesú"));

console.log("This returns false, because the string equals a special character: " + okChar("ú"));
Community
  • 1
  • 1
mplungjan
  • 169,008
  • 28
  • 173
  • 236
2

as @Gabriel commented, it's returning true because there's at least one character in the string that matches your range

what you want to do is check that every character is within the range

/^[\u0001-\u00FF]+$/

or that any character is not within the range

[^\u0001-\u00FF]

in the second case you'd have true when a special character is used and false when all characters are safe, so you probably have to flip the checks you do afterward

alebianco
  • 2,475
  • 18
  • 25
  • Thanks. the second one works perfect for me. However I am now a bit confused why :D. I noticed the range I defined, was completely wrong. When you look at the UTF-8 Table, it's just that I wanted to leave out the characters in that range I specified. https://de.wikipedia.org/wiki/UTF-8 Normally this would have been the first to rows, so special characters. But somehow it works how I want to. When a user puts in ä, ö, and normal Strings, like "Test" everything is ok, but as soon as they enter a symbol like that question mark, it will return an error (I used !regex.test() so I get the right bool – user5417542 Jan 30 '17 at 10:06
  • you can use a reference table of unicode characters like https://unicode-table.com/en/ to find what you actually need. right now you're taking the first 255 characters, which include the standard alphabet, plus numbers, some standard-ish symbols and a bunch of special characters: not a very sane range in my opinion :) – alebianco Jan 30 '17 at 10:23
  • just for a quick reference, the basic latin set goes from \u0020 to \u007F – alebianco Jan 30 '17 at 10:26