1

I'm trying to make sure that a string does not have any weird ASCII characters.

I'm trying to use character classes and negation.

 var tester =/[^\x00-\x001F\x007\x080-\xA1]+/i;

So: no ASCII characters between 00-1F, 07; or 80-A1 should be present. Everything else should be fine.

I am coming back to regular expressions after a long time away... The regular expression is NOT working. I want a string like "hello" to pass and a string like "†ack!" to fail. Or, is my JavaScript/jQuery code wrong?

The code:

var tester2 = /^[^\x00-\x1f\x80-\xa1]+$/;
    $('#testButton').click(function(){
        var text1 = $('#ackInput').val();
        console.log("text: " + text1);
        var allowed  = tester2.test(text1);
        var feedback = "allowed?" + allowed;
        console.log(feedback);
        $('#errorTestInputAllowedChars').text(feedback);
    });

An entry on jsFiddle is at http://jsfiddle.net/jillrenee42/WE79e/2/.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jill
  • 539
  • 1
  • 5
  • 12
  • edited to make it clearer – Jill Dec 11 '13 at 15:39
  • @hwnd or `/^[ -~]+$/` – Pointy Dec 11 '13 at 15:40
  • international printed chars are desired, hence the /^[ -~]+$/ does not fill business requirements – Jill Dec 11 '13 at 15:57
  • [Here is a jsfiddle that shows the fixed regex working.](http://jsfiddle.net/j33TK/) – Pointy Dec 11 '13 at 16:15
  • 1
    The "dagger" symbol that you get from † is not 86 hex; it's a Unicode character (†) – Pointy Dec 11 '13 at 16:16
  • @Pointy: how did you figure that out? (that's it not 86 hex and is the unicode that is way outside the range?) thanks. – Jill Dec 11 '13 at 16:20
  • 1
    I looked it up [using this reference](http://www.elizabethcastro.com/html/extras/entities.html). I don't have any idea why the HTML entities were set up that way. The characters in the upper part of 8-bit encodings are kind-of not well-standardized I think; real "ASCII" is only a 7-bit code. – Pointy Dec 11 '13 at 16:21
  • so this is not just a lesson in regex; but in unicode: http://www.utf8-chartable.de/ and http://stackoverflow.com/questions/10361579/are-unicode-and-ascii-characters-the-same – Jill Dec 11 '13 at 16:37

3 Answers3

1

You need to make sure that the whole string matches:

var tester = /^[^\x00-\x001F\x007\x080-\xA1]+$/i;

That \x notation doesn't seem correct to me, and this works when I try it:

var tester = /^[^\u0000-\u001F\u0080-\u00A1]+$/i;
Pointy
  • 405,095
  • 59
  • 585
  • 614
  • I tried that. then the string "hello" is false (not true as it should be) – Jill Dec 11 '13 at 15:36
  • I updated the fiddle; http://jsfiddle.net/jillrenee42/WE79e/1/ and it now passes EVERYTHING; including the †ack string – Jill Dec 11 '13 at 15:47
  • @JillRenee you did not update the fiddle correctly; your regular expression is still invalid, and it doesn't include the `^` and `$` at the start and end. – Pointy Dec 11 '13 at 15:49
  • how are you testing it? this is very frustrating; so far nothing is working for me. – Jill Dec 11 '13 at 16:00
  • @JillRenee The "dagger" symbol has a character code that falls far outside the ranges you are disallowing. – Pointy Dec 11 '13 at 16:05
  • @JillRenee in other words, the dagger symbol *should* pass the test. – Pointy Dec 11 '13 at 16:09
  • but the "dagger" is 86 in Hex; which would be between 80-a1 . see: http://www.ascii-code.com/ . There is something here I am not getting/understanding. – Jill Dec 11 '13 at 16:14
  • changed the regex to: var tester = /^[^\u0000-\u001F\u007F-\u00A1]+$/i; and I'm using that one.@Pointy thanks for all your help! – Jill Dec 11 '13 at 16:40
1

In Javascript hexcodes are 2 digit codes so following will work for you:

/^[^\x00-\x1F\x07\x80-\xFF]+$/

Javascript Regex Reference

anubhava
  • 761,203
  • 64
  • 569
  • 643
0

Your hex encodings of your desired ranges are incorrect. You want this instead

[^\x00-\x1f\x80-\xa1]

Note I left out \x07 because that's already in the range of \x00-\x1f


EDIT

As Pointy points out, you will need to negate the entire string.

maček
  • 76,434
  • 37
  • 167
  • 198