7

I need a regex that also matches Chinese, Greek, Russian, ... letters. What I basically want to do is remove punctuation and numbers.

Until now I removed punctuation and numbers "manually" but that does not seem to be very consistent.

Another thing I have tried is

/[\p{L}]/

but that is not supported by Mozilla (I use this in a Firefox extension).

slosd
  • 3,224
  • 2
  • 21
  • 17
  • Do you need to just match letters according to the particular user's language (which just means you need a locale aware regex engine) or do you need to match anything that is a letter in any possible language? – balpha Jul 04 '09 at 20:52
  • 2
    And which punctuation do you need to remove? Do you need to remove the apostrophe in O'Brien? – John Saunders Jul 04 '09 at 20:56
  • `[\p{P}\p{N}]` describes punctuation and numbers. – Gumbo Jul 04 '09 at 20:59
  • Thanks for that great question. I would also like that, but was sure it was not possible. – User Jul 04 '09 at 21:30

2 Answers2

4

Have you given XRegExp and the Unicode plugin a try/look?

<script src="xregexp.js"></script>
<script src="xregexp-unicode.js"></script>
<script>
    var unicodeWord = XRegExp("^\\p{L}+$");
    alert(unicodeWord.test("Ниндзя")); // -> true
</script>
Jonathan Lonowski
  • 121,453
  • 34
  • 200
  • 199
  • 1
    Thanks, that's exactly what I was looking for. Though, I don't really want to include a 8kb library that I only use once in my extension. The unicode ranges in the Unicode plugin are very helpful and I think I will use those to write something myself. – slosd Jul 05 '09 at 08:38
1

You can find a lot complains about the current ECMA specs on regular expressions not dealing with unicode characters the way they should. E.g. a blog entry by Scott Hanselman that links back to a SO question ;-)
There's no "real" solution to this problem yet, but take a look at the answers of Javascript + Unicode regexes (your question is more or less a duplicate of this) (edit: I take that back, the unicode plugin Jonathan Lonowski suggests look pretty nice)

Community
  • 1
  • 1
VolkerK
  • 95,432
  • 20
  • 163
  • 226