Regex to test if only ASCII characters

Question

I have tried this but it returns true for both UTF-8 and ASCII:

 console.log(/[^w/]+/.test("abc123")) //true
 console.log(/[^w/]+/.test("ابت")) //true

Note that your regex as written currently tests that your string does not consist entirely of `w` and `/` characters. You probably meant to do `\w` instead of `w/`. — apsillers, Aug 02 '13 at 13:22
try using ^\w+$, you need to force it to match the complete string. \w should only match latin1. You can use \W for inverse match. — jishi, Aug 02 '13 at 13:23

MDEV · Accepted Answer · 2013-08-02T13:45:43.507

16

I think you meant /[^\w]+/ but what you really want, from what I can gather, is:

console.log(/^[\x00-\x7F]+$/.test("abc123")) //true
console.log(/^[\x00-\x7F]+$/.test("abc_-8+")) //true
console.log(/^[\x00-\x7F]+$/.test("ابت")) //false

If you didn't actually mean to check the full ASCII set, you can just use:

console.log(/^[\w]+$/.test("abc123")) //true
console.log(/^[\w]+$/.test("abc_-8+")) //false
console.log(/^[\w]+$/.test("ابت")) //false

About \x notation

\xFF is a hexadecimal notation (list here) used in this example for the range 00 to 7F to match the full ASCII character set. \x00-\x7F is functionally indentical to a-z in that it specifies a range, however we are using hex notation for reliable ranging

\w matches 'word' characters, which is the same as [a-z0-9_]

edited Aug 02 '13 at 13:45

answered Aug 02 '13 at 13:21

MDEV

10,730
2
33
49

How about this `/^[^a-zA-Z0-9]+$/` – Sami Aug 02 '13 at 13:28
2

@Sami — There are lots of ASCII characters that aren't letters or numbers. – Quentin Aug 02 '13 at 13:28
@Sami As Quentin mentioned that's not the full ASCII set of characters, see my updated answer – MDEV Aug 02 '13 at 13:30
@SmokeyPHP — That looks right to be, but it would be really great if you could link to documentation explaining how the \x syntax works and describing the ASCII character range in unicode. – Quentin Aug 02 '13 at 13:30
Your first pattern for the full ASCII strangely does not work. It always returns `true`. However the second one(`/^[\w]+$/`) works. – Sami Aug 02 '13 at 13:36
1

@Sami `console.log(/^[\x00-\x7F]+$/.test("ابت"))` Doesn't return true - what are you getting true from that you didn't expect to? – MDEV Aug 02 '13 at 13:37
@Quentin Am trying to find a decent link now, not much out there that explains it properly – MDEV Aug 02 '13 at 13:38
Be aware that with `\w` Unicode-enabled regexp will (correctly) include Arabic 'word' characters as well (and Greek, Cyrillic, Thai ... and so on). The OP *probably* means "plain ASCII" but *may* also mean "Latin-1". – Jongware Aug 02 '13 at 13:54
1

In case the OP indeed means to check *plain* ASCII: there is an expression for that: `/^[[:ascii:]]+$/` – Jongware Aug 02 '13 at 13:56
@Jongware According to the table on [this page](http://www.regular-expressions.info/posixbrackets.html) they're the same thing? – MDEV Aug 02 '13 at 13:59
@SmokeyPHP: `/^[[:ascii:]]+$/` and `/^\w+$/`, are not same! – Cylian Aug 02 '13 at 14:52
@Cylian I know... I was referring to `[\x00-\x7F]` – MDEV Aug 02 '13 at 14:53
@SmokeyPHP: They are they same, but the shortcut is easier to remember. Kind like `[0-9]` versus `\d`. – Jongware Aug 02 '13 at 15:15
failed for : sdfsdfsक ... A single character at the end failed it – cosmoloc May 22 '15 at 12:51

Regex to test if only ASCII characters

1 Answers1

Linked