1

I need a help regarding regular expression.

I have to match string like this: âãa34dc

Pattern that i have used:

\s*[a-zA-Z]+[a-zA-Z_0-9]*\s

but this pattern is not good enough to identify this kind of string e.g. âãa34dc

P.S. âã these are swedish character.

Please help me for find out correct pattern for this kind of string.

David Yaw
  • 27,383
  • 4
  • 60
  • 93
  • A minor correction, which probably does not change the validity of existing answers: "â" and "ã" are not used in the Swedish language, except for spelling foreign names or places. What OP wants is probably "åäö/ÅÄÖ". – allansson May 09 '17 at 16:21

3 Answers3

3

Do you actually want to restrict it to Swedish characters? In other words, should a German character not match? If so, then you'll probably have to enumerate the whole alphabet, and include that.

If what you really want is to match every alphabetic character, use the regular expression terms for matching all letters.

\w matches any word character, but that includes numbers & some punctuation. That's close, but not exactly what you want for your second term.

For the first term, where you don't want to include numbers, specifying that the character should be a Unicode 'letter' class will work. \p{L} specifies all Unicode characters that are a letter. This includes [a-zA-Z], and all the Swedish characters, and German, and Russian, etc.

Therefore, I think this regular expression is what you want:

\s*[\p{L}][\p{L}_0-9]*\s

If you want to include digits from other character sets, and some other punctuation, then you can use [\w]* for the second term.

David Yaw
  • 27,383
  • 4
  • 60
  • 93
0

please give a set of rules.

according to your question :

    [X-Ya-zA-Z]{3}[0-9]{2}[a-zA-Z]{2}

Replace X with the first swedish letter

Replace Y with the last swedish letter

Royi Namir
  • 144,742
  • 138
  • 468
  • 792
  • Rule is same as english alphabets. Name may be start with swedish character or may be start with english alphabets. So I want patter that match all of my condition as I wrote above. thanks for your time and sorry for not putting unclear question. – user1213444 Apr 06 '12 at 18:41
0

John Machin provides a great answer for this. Adapting his pattern, what you need is probably something similar to: \s*[^\W\d_]\w*\s*

P.S. I removed the + quantifier from your first part. Any subsequent letters would be matched by the subsequent quantified \w.

Community
  • 1
  • 1
Douglas
  • 53,759
  • 13
  • 140
  • 188
  • Rule is same as english alphabets. Name may be start with swedish character or may be start with english alphabets.and it may contain digit and underscore as well. So I want patter that match all of my condition as I wrote above. Your answer is not helped me at all to solve my problem. – user1213444 Apr 06 '12 at 19:10
  • 1
    No, `\w` is not the same as `[A-Za-z0-9_]`. In a Unicode-aware environment (such as .NET), `\w` will match any letter of any alphabet (including Swedish). Did you actually bother trying out my pattern? – Douglas Apr 06 '12 at 19:54