2
input char:a      (unicode:97) output type:2
input char:Space  (unicode:32) output type:12

in java i can use code: "int type = Character.getType(unicode)" Character.getType Api

nroe
  • 163
  • 1
  • 1
  • 6

2 Answers2

2

There is a regexp plugin which supports Unicode categories: http://xregexp.com/plugins/.

Using that, you could create a function that checks for each category like:

var types = [
    'Ll', 'Lu', 'Lt', 'Lm', 'Lo', 'Mn', 'Mc', 'Me', 'Nd', 'Nl',
    'No', 'Pd', 'Ps', 'Pe', 'Pi', 'Pf', 'Pc', 'Po', 'Sm', 'Sc',
    'Sk', 'So', 'Zs', 'Zl', 'Zp', 'Cc', 'Cf', 'Co', 'Cs', 'Cn'
];

function getType(char) {
    var char = (char + "").charAt(0);
    for(var i = 0; i < types.length; i++) {
        if(XRegExp("\\p{" + types[i] + "}").test(char)) {
            return types[i];
        }
    }
}

alert(getType(" ")); // alerts Zs, because " " is a space separator character

http://jsfiddle.net/pimvdb/mYfCZ/1/

pimvdb
  • 151,816
  • 78
  • 307
  • 352
  • very cool, but this function depends on the regexp, it will slow.any faster way to do ? – nroe Aug 12 '11 at 05:16
  • This is a good solution. If you want it to be faster, nroe, you can simply cache the type for each character that the `getType` function has already found an answer to. – slevithan Jul 29 '12 at 09:46
1

Well, there's the nodeType property which will tell you if it is a text node or an HTML element, for example. As far as obtaining the unicode category, I don't believe there is a native function for that. You can try this plugin which will offer unicode support for regex:

http://xregexp.com/plugins/

http://www.javascriptkit.com/domref/nodetype.shtml

AlienWebguy
  • 76,997
  • 17
  • 122
  • 145