Pattern matching for swedish character

Question

I need a help regarding regular expression.

I have to match string like this: âãa34dc

Pattern that i have used:

\s*[a-zA-Z]+[a-zA-Z_0-9]*\s

but this pattern is not good enough to identify this kind of string e.g. âãa34dc

P.S. âã these are swedish character.

Please help me for find out correct pattern for this kind of string.

A minor correction, which probably does not change the validity of existing answers: "â" and "ã" are not used in the Swedish language, except for spelling foreign names or places. What OP wants is probably "åäö/ÅÄÖ". — allansson, May 09 '17 at 16:21

David Yaw · Answer 1 · 2012-04-06T18:58:59.620

Do you actually want to restrict it to Swedish characters? In other words, should a German character not match? If so, then you'll probably have to enumerate the whole alphabet, and include that.

If what you really want is to match every alphabetic character, use the regular expression terms for matching all letters.

\w matches any word character, but that includes numbers & some punctuation. That's close, but not exactly what you want for your second term.

For the first term, where you don't want to include numbers, specifying that the character should be a Unicode 'letter' class will work. \p{L} specifies all Unicode characters that are a letter. This includes [a-zA-Z], and all the Swedish characters, and German, and Russian, etc.

Therefore, I think this regular expression is what you want:

\s*[\p{L}][\p{L}_0-9]*\s

If you want to include digits from other character sets, and some other punctuation, then you can use [\w]* for the second term.

score 0 · Answer 2 · answered Apr 06 '12 at 18:32

0

please give a set of rules.

according to your question :

    [X-Ya-zA-Z]{3}[0-9]{2}[a-zA-Z]{2}

Replace X with the first swedish letter

Replace Y with the last swedish letter

answered Apr 06 '12 at 18:32

Royi Namir

144,742
138
468
792

Rule is same as english alphabets. Name may be start with swedish character or may be start with english alphabets. So I want patter that match all of my condition as I wrote above. thanks for your time and sorry for not putting unclear question. – user1213444 Apr 06 '12 at 18:41

score 0 · Answer 3 · edited May 23 '17 at 12:34

0

John Machin provides a great answer for this. Adapting his pattern, what you need is probably something similar to: \s*[^\W\d_]\w*\s*

P.S. I removed the + quantifier from your first part. Any subsequent letters would be matched by the subsequent quantified \w.

edited May 23 '17 at 12:34

Community

1
1

answered Apr 06 '12 at 18:48

Douglas

53,759
13
140
188

Rule is same as english alphabets. Name may be start with swedish character or may be start with english alphabets.and it may contain digit and underscore as well. So I want patter that match all of my condition as I wrote above. Your answer is not helped me at all to solve my problem. – user1213444 Apr 06 '12 at 19:10
1

No, `\w` is not the same as `[A-Za-z0-9_]`. In a Unicode-aware environment (such as .NET), `\w` will match any letter of any alphabet (including Swedish). Did you actually bother trying out my pattern? – Douglas Apr 06 '12 at 19:54

Pattern matching for swedish character

3 Answers3