13

Am trying to find some text only if it contains english letters and numbers using Javascript/jQuery.

Am wondering what is the most efficient way to do this? Since there could be thousands of words, it should be as fast as possible and I don't want to use regex.

 var names[0] = 'test';
 var names[1] = 'हिन';
 var names[2] = 'لعربية';

 for (i=0;i<names.length;i++) {
    if (names[i] == ENGLISHMATCHCODEHERE) {
        // do something here
    }
 }

Thank you for your time.

Alec Smart
  • 94,115
  • 39
  • 120
  • 184
  • 3
    What's wrong with using regex? – NakedBrunch Mar 08 '10 at 16:40
  • Wouldn't regex be slow? Especially on hundreds of words. – Alec Smart Mar 08 '10 at 16:44
  • 2
    I think you need to explain *why* you don't want to use regex. Regular Expressions exist for precisely this kind of problem: string pattern matching. I'd warrant that regular expressions will be 'as fast as possible' so, without more information, excluding them as a solution doesn't make sense. – Dancrumb Mar 08 '10 at 16:45
  • Side note: Your sample code is invalid, the initial set of declarations are syntax errors. I assume you mean `var names = [];` and then `names[0] = ...`, etc. (no `var` in front of them) and that you're building an array of names. – T.J. Crowder Mar 08 '10 at 16:50
  • 1
    @Alec: *"Wouldn't regex be slow? Especially on hundreds of words."* No, almost certainly faster (not just faster, but **lots** faster) than just about anything else you're going to do, in fact. Be sure you're reusing (rather than recompiling) the regex in each loop iteration (a literal is fine). – T.J. Crowder Mar 08 '10 at 16:51
  • @T.J. Sorry this was just an example... Am an advanced JS programmer. Was hoping for a faster solution than regex. – Alec Smart Mar 08 '10 at 16:52
  • 1
    Slow as compared to, say, chunking through the strings one character at a time making sure they all fall within a specific range? Implement it with regexes. If the solution proves too slow, cross that bridge when you come to it. But when your problem looks like a nail, feels like a nail, smells like a nail, is long and pointy like a nail, and has a label saying "Hi! I'm a nail!", quit overthinking your solution and grab a hammer. – BlairHippo Mar 08 '10 at 16:54
  • @Alec: See my comment above (overlapping comments). I very much doubt you're not going to do it faster in JavaScript than the JavaScript engine's built-in regex handling. – T.J. Crowder Mar 08 '10 at 16:54
  • Thanks everyone for your inputs :) – Alec Smart Mar 08 '10 at 17:06

5 Answers5

34

A regular expression for this might be:

var english = /^[A-Za-z0-9]*$/;

Now, I don't know whether you'll want to include spaces and stuff like that; the regular expression could be expanded. You'd use it like this:

if (english.test(names[i])) // ...

Also see this: Regular expression to match non-English characters?

edit my brain filtered out the "I don't want to use a regex" because it failed the "isSilly()" test. You could always check the character code of each letter in the word, but that's going to be slower (maybe much slower) than letting the regex matcher work. The built-in regular expression engine is really fast.

When you're worried about performance, always do some simple tests first before making assumptions about the technology (unless you've got intimate knowledge of the technology already).

Community
  • 1
  • 1
Pointy
  • 405,095
  • 59
  • 585
  • 614
5

Iterate each character in the string and check if the key code is not between 65 and 122, which are the Latin alphabet, lowercase and uppercase.

If wished to add punctuations characters, add their keyCode to the check.

function isLatinString(s) {
  var i, charCode;
  for (i = s.length; i--;) {
    charCode = s.charCodeAt(i)
    if (charCode < 65 || charCode > 122)
      return charCode
  }
  return true
}

// tests
[
  "abxSDSzfgr", 
  "aAzZ123dsfsdfעחלעלחי", 
  "abc!", 
  "$abc", 
  "123abc",
  " abc"
]
.forEach(s => console.log(   isLatinString(s), s   ))

Another way, using an explicit whitelist string to allow specific charatcers:

function isLatinString(s){
  var c, whietlist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
  for( c in s ) // get each character in the argument string
    // if whitelist string doesn't include the character, break
    if( !whietlist.includes(s[c].toUpperCase()) ) 
      return false
  return true
}

// tests
[
  "abCD", 
  "aAאב", 
  "abc!", 
  "$abc", 
  "1abc",
  " abc"
]
.forEach(s => console.log(   isLatinString(s), s   ))
vsync
  • 118,978
  • 58
  • 307
  • 400
  • If you need to customize this function, this will come in handy : http://www.cambiaresearch.com/articles/15/javascript-char-codes-key-codes – eric.itzhak Dec 12 '13 at 09:41
5

If you're dead set against using regexes, you could do something like this:

// Whatever valid characters you want here
var ENGLISH = {};
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".split("").forEach(function(ch) {
    ENGLISH[ch] = true;
});

function stringIsEnglish(str) {
    var index;

    for (index = str.length - 1; index >= 0; --index) {
        if (!ENGLISH[str.substring(index, index + 1)]) {
            return false;
        }
    }
    return true;
}

Live Example:

// Whatever valid characters you want here
var ENGLISH = {};
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".split("").forEach(function(ch) {
    ENGLISH[ch] = true;
});

function stringIsEnglish(str) {
    var index;

    for (index = str.length - 1; index >= 0; --index) {
        if (!ENGLISH[str.substring(index, index + 1)]) {
            return false;
        }
    }
    return true;
}

console.log("valid", stringIsEnglish("valid"));
console.log("invalid", stringIsEnglish("invalid!"));

...but a regex (/^[a-z0-9]*$/i.test(str)) would almost certainly be faster. It is in this synthetic benchmark, but those are often unreliable.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
1

Using regex is the fastest way to do this I'm afraid. This to my knowledge should be the fastest algorithm:

var names = 'test',
var names[1] = 'हिन';
var names[2] = 'لعربية';

//algorithm follows
var r = /^[a-zA-Z0-9]+$/,
    i = names.length;

while (--i) {
    if (r.test(names[i])) {
        // do something here
    }
}
raveren
  • 17,799
  • 12
  • 70
  • 83
  • Your code contains several errors. You cannot declare names[1] with var keyword. You cannot modify string item using the index, as it is an immutable object. – BaseScript Sep 04 '22 at 22:41
1

You should consider words that may contain special characters. For example {it's}, isn't it english?

mhd196
  • 21
  • 1