Regex to match only letters

Question

How can I write a regex that matches only letters?

What's your definition of `characters`? ASCII? Kanji? Iso-XXXX-X? UTF8? — Ivo Wetzel, Sep 01 '10 at 12:10
I have noticed that **\p{L}** for a letter and **/u** flag for the Unicode matches any letter in my regex i.e. `/\p{L}+/u` — MaxZoom, Sep 26 '19 at 16:59

Gumbo · Answer 1 · 2010-09-01T12:17:11.640

582

Use a character set: [a-zA-Z] matches one letter from A–Z in lowercase and uppercase. [a-zA-Z]+ matches one or more letters and ^[a-zA-Z]+$ matches only strings that consist of one or more letters only (^ and $ mark the begin and end of a string respectively).

If you want to match other letters than A–Z, you can either add them to the character set: [a-zA-ZäöüßÄÖÜ]. Or you use predefined character classes like the Unicode character property class \p{L} that describes the Unicode characters that are letters.

edited Sep 01 '10 at 12:17

answered Sep 01 '10 at 12:09

Gumbo

643,351
109
780
844

165

That's a very ASCII-centric solution. This will break on pretty much any non-english text. – Joachim Sauer Sep 01 '10 at 12:13
12

@Joachim Sauer: It will rather break on languages using non-latin characters. – Gumbo Sep 01 '10 at 12:17
21

Already breaks on 90% of German text, don't even mention French or Spanish. Italian might still do pretty well though. – Ivo Wetzel Sep 01 '10 at 12:22
12

that depends on what definition of "latin character" you choose. J, U, Ö, Ä can all be argued to be latin characters or not, based on your definition. But they are all used in languages that use the "latin alphabet" for writing. – Joachim Sauer Sep 01 '10 at 12:23
15

\p{L} matches all the umlauts sedilla accents etc, so you should go with that. – Radu Simionescu Oct 11 '16 at 08:45
Works well in a selector engine for determining if the selector is just a tag name. – user1329482 Oct 14 '17 at 17:40
What do you do if you can't use `[]` because Python is too thick to understand nestings? – AER Jan 11 '18 at 08:18
2

Instead of keep adding characters like adding äöüßÄÖÜ, you can go: ^[a-zA-Z]\p{L}+$ to include most of the western alphabets. – Pablo Sep 07 '19 at 21:15

score 263 · Answer 2 · answered Sep 01 '10 at 12:10

263

\p{L} matches anything that is a Unicode letter if you're interested in alphabets beyond the Latin one

answered Sep 01 '10 at 12:10

RobV

28,022
11
77
119

2

not in all regex flavours. For example, vim regexes treat `\p` as "Printable character". – Philip Potter Sep 01 '10 at 12:12
4

[this page](http://www.regular-expressions.info/refflavors.html) suggests only java, .net, perl, jgsoft, XML and XPath regexes support \p{L}. But major omissions: python and ruby (though python has the regex module). – Philip Potter Sep 01 '10 at 12:16
6

@Philip Potter: Ruby supports Unicode character properties using that exact same syntax. – Jörg W Mittag Sep 01 '10 at 13:14
18

I think this should be `\p{L}\p{M}*+` to cover letters made up of multiple codepoints, e.g. a letter followed by accent marks. As per http://www.regular-expressions.info/unicode.html – ZoFreX Sep 16 '16 at 13:42
with python 3 this yields an error `bad escape \p at position 0` – matanster Apr 19 '19 at 16:23
Doesn't work in firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=1361876 – Stefan Haustein Dec 02 '19 at 19:24
10

**JavaScript** needs `u` after regex to detect the unicode group: `/\p{Letter}/gu` – jave.web Mar 17 '21 at 16:13

score 71 · Answer 3 · edited Feb 08 '14 at 01:28

71

Depending on your meaning of "character":

[A-Za-z] - all letters (uppercase and lowercase)

[^0-9] - all non-digit characters

edited Feb 08 '14 at 01:28

António Almeida

9,620
8
59
66

answered Sep 01 '10 at 12:12

KristofMols

3,487
2
38
48

I meant lettters. It doesn't appear to be working though. preg_match('/[a-zA-Z]+/', $name); – Nike Sep 01 '10 at 12:19
1

[A-Za-z] is just the declaration of characters you can use. You still need to declare howmany times this declaration has to be used: [A-Za-z]{1,2} (to match 1 or 2 letters) or [A-Za-z]{1,*} (to match 1 or more letters) – KristofMols Sep 01 '10 at 13:06
34

well à, á, ã, Ö, Ä... are letters too, so are অ, আ, ই, ঈ, Є, Ж, З, ﺡ, ﺥ, ﺩא, ב, ג, ש, ת, ... https://en.wikipedia.org/wiki/Letter_%28alphabet%29 – phuclv Sep 20 '16 at 09:50
@phuclv: Indeed, but that depends on the encoding, and the encoding is part of the settings of the program (either the default config or the one declared in a config file of the program). When I worked on different languages, I used to store that in a constant, in a config file. – Catalina Chircu Oct 14 '19 at 18:22
2

@CatalinaChircu encoding is absolutely irrelevant here. Encoding is a way to encode a code point in a character set in binary, for example UTF-8 is an encoding for Unicode. Letters OTOH depends on the language, and if one says `[A-Za-z]` are letters then the language that's being used must be specified – phuclv Oct 15 '19 at 01:36
@phuclv: Indeed, I should have mentioned the language, not the encoding. The language is important and finding the letters in English is not the same as finding the letters in Spanish or French. If you do not take into account the diacritics in these languages you can cut words in two. – Catalina Chircu Oct 15 '19 at 15:32

score 36 · Answer 4 · answered Oct 17 '14 at 11:50

36

The closest option available is

[\u\l]+

which matches a sequence of uppercase and lowercase letters. However, it is not supported by all editors/languages, so it is probably safer to use

[a-zA-Z]+

as other users suggest

answered Oct 17 '14 at 11:50

blue_note

27,712
9
72
90

3

Won't match any special characters though. – Nyerguds May 25 '16 at 06:25
For a long time I had been using [A-z]+ but just noticed this allows a few special characters like ` and [ to slip in. [a-zA-Z]+ is indeed the way to go. – Eric Soyke Nov 30 '21 at 16:07

score 27 · Answer 5 · edited Sep 23 '17 at 23:23

27

You would use

/[a-z]/gi

[]--checks for any characters between given inputs

a-z---covers the entire alphabet

g-----globally throughout the whole string

i-----getting upper and lowercase

edited Sep 23 '17 at 23:23

Peter Mortensen

30,738
21
105
131

answered Apr 04 '16 at 10:01

Scott

279
3
2

score 21 · Answer 6 · answered Aug 20 '20 at 20:27

In python, I have found the following to work:

[^\W\d_]

This works because we are creating a new character class (the []) which excludes (^) any character from the class \W (everything NOT in [a-zA-Z0-9_]), also excludes any digit (\d) and also excludes the underscore (_).

That is, we have taken the character class [a-zA-Z0-9_] and removed the 0-9 and _ bits. You might ask, wouldn't it just be easier to write [a-zA-Z] then, instead of [^\W\d_]? You would be correct if dealing only with ASCII text, but when dealing with unicode text:

\W

Matches any character which is not a word character. This is the opposite of \w. > If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_].

^ from the python re module documentation

That is, we are taking everything considered to be a word character in unicode, removing everything considered to be a digit character in unicode, and also removing the underscore.

For example, the following code snippet

import re
regex = "[^\W\d_]"
test_string = "A;,./>>?()*)&^*&^%&^#Bsfa1 203974"
re.findall(regex, test_string)

Returns

['A', 'B', 's', 'f', 'a']

What about non Latin letter? For example `çéàñ`. Your regex is less readable than `\p{L}` — Toto, Aug 21 '20 at 09:59
Clever answer. Works perfectly for accented letters as well. — Frederic, Oct 30 '20 at 21:04
@Toto Python's `re` module doesn't support Unicode properties. You have to use the re.UNICODE flag for Unicode support. Hence the `[^\W\d_]` pattern, which is the closest thing for "any letter" in Python's regex engine. — Thegerdfather, Mar 28 '23 at 06:25

score 18 · Answer 7 · edited Apr 15 '18 at 19:39

18

Java:

String s= "abcdef";

if(s.matches("[a-zA-Z]+")){
     System.out.println("string only contains letters");
}

edited Apr 15 '18 at 19:39

Wiktor Stribiżew

607,720
39
448
563

answered Mar 22 '17 at 17:25

Udeshika Sewwandi

243
2
2

5

it doesn't include diacritic signs such as `ŹŻŚĄ` – karoluS Sep 24 '18 at 07:37
^ or any Cyrillic letters – dimitar.bogdanov May 04 '21 at 16:32

score 16 · Answer 8 · answered Sep 13 '16 at 07:05

Regular expression which few people has written as "/^[a-zA-Z]$/i" is not correct because at the last they have mentioned /i which is for case insensitive and after matching for first time it will return back. Instead of /i just use /g which is for global and you also do not have any need to put ^ $ for starting and ending.

/[a-zA-Z]+/g

[a-z_]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed
a-z a single character in the range between a and z (case sensitive)
A-Z a single character in the range between A and Z (case sensitive)
g modifier: global. All matches (don't return on first match)

score 14 · Answer 9 · answered Sep 01 '10 at 12:12

14

/[a-zA-Z]+/

Super simple example. Regular expressions are extremely easy to find online.

http://www.regular-expressions.info/reference.html

answered Sep 01 '10 at 12:12

Scott Radcliff

1,501
1
9
13

score 13 · Answer 10 · answered Nov 14 '13 at 16:22

13

For PHP, following will work fine

'/^[a-zA-Z]+$/'

answered Nov 14 '13 at 16:22

Rohit Dubey

1,234
15
15

score 10 · Answer 11 · edited Jun 08 '14 at 19:53

10

Just use \w or [:alpha:]. It is an escape sequences which matches only symbols which might appear in words.

edited Jun 08 '14 at 19:53

Amal Murali

75,622
18
128
150

answered May 28 '14 at 13:33

Agaspher

485
3
10

9

`\w` may not be a good solution in all cases. At least in PCRE, `\w` can match other characters as well. Quoting the [PHP manual](http://uk3.php.net/manual/en/regexp.reference.escape.php): "*A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.*". – Amal Murali Jun 08 '14 at 19:56
1

words include other characters from letters – V-SHY May 15 '15 at 03:05
4

`\w` means match letters and numbers – Eugen Konkov Aug 26 '16 at 16:10
how to match words with only alphabet characters? – y_159 Aug 19 '22 at 11:09

score 10 · Answer 12 · answered Jun 27 '17 at 11:44

10

Use character groups

\D

Matches any character except digits 0-9

^\D+$

See example here

answered Jun 27 '17 at 11:44

Tomáš Nedělka

203
2
2

12

This will also match whitespace, symbols, etc. which does not seem to be what the question is asking for. – DaveMongoose Jan 02 '18 at 09:31

Javi Marzán · Answer 13 · 2021-03-26T14:43:24.293

So, I've been reading a lot of the answers, and most of them don't take exceptions into account, like letters with accents or diaeresis (á, à, ä, etc.).

I made a function in typescript that should be pretty much extrapolable to any language that can use RegExp. This is my personal implementation for my use case in TypeScript. What I basically did is add ranges of letters with each kind of symbol that I wanted to add. I also converted the char to upper case before applying the RegExp, which saves me some work.

function isLetter(char: string): boolean {
  return char.toUpperCase().match('[A-ZÀ-ÚÄ-Ü]+') !== null;
}

If you want to add another range of letters with another kind of accent, just add it to the regex. Same goes for special symbols.

I implemented this function with TDD and I can confirm this works with, at least, the following cases:

    character | isLetter
    ${'A'}    | ${true}
    ${'e'}    | ${true}
    ${'Á'}    | ${true}
    ${'ü'}    | ${true}
    ${'ù'}    | ${true}
    ${'û'}    | ${true}
    ${'('}    | ${false}
    ${'^'}    | ${false}
    ${"'"}    | ${false}
    ${'`'}    | ${false}
    ${' '}    | ${false}

@VadimAidlin then you need to add it to the RegExp string like in the provided code.(`firstLetter-lastLetter`). To make sure that it works, you can implement a test that checks your use cases. — Javi Marzán, Mar 22 '23 at 12:47

score 6 · Answer 14 · edited Jun 08 '14 at 19:53

If you mean any letters in any character encoding, then a good approach might be to delete non-letters like spaces \s, digits \d, and other special characters like:

[!@#\$%\^&\*\(\)\[\]:;'",\. ...more special chars... ]

Or use negation of above negation to directly describe any letters:

\S \D and [^  ..special chars..]

Pros:

Works with all regex flavors.
Easy to write, sometimes save lots of time.

Cons:

Long, sometimes not perfect, but character encoding can be broken as well.

Motlab · Answer 15 · 2014-07-29T07:33:48.343

5

You can try this regular expression : [^\W\d_] or [a-zA-Z].

edited Jul 29 '14 at 07:33

answered Jul 25 '14 at 13:27

Motlab

71
1
2

That is not what `[^\W|\d]` means – OGHaza Jul 25 '14 at 13:34
1

`[^\W|\d]` means not `\W` and not `|` and not `\d`. It has the same net effect since `|` is part of `\W` but the `|` does not work as you think it does. Even then that means it accepts the `_` character. You are probably looking for `[^\W\d_]` – OGHaza Jul 25 '14 at 14:47
I agree with you, it accepts the `_`. But "NOT" `|` is equal than "AND", so `[^\W|\d]` means : NOT `\W` **AND** NOT `\d` – Motlab Jul 25 '14 at 15:01
12

`[^ab]` means not `a` and not `b`. `[^a|b]` means not `a` and not `|` and not `b`. To give a second example `[a|b|c|d]` is exactly the same as `[abcd|||]` which is exactly the same as `[abcd|]` - all of which equate to `([a]|[b]|[c]|[d]|[|])` the `|` is a literal character, not an OR operator. The OR operator is implied between each character in a character class, putting an actual `|` means you want the class to accept the `|` (pipe) character. – OGHaza Jul 25 '14 at 15:53

score 4 · Answer 16 · answered Feb 11 '20 at 18:27

4

Lately I have used this pattern in my forms to check names of people, containing letters, blanks and special characters like accent marks.

pattern="[A-zÀ-ú\s]+"

answered Feb 11 '20 at 18:27

cblnpa

397
1
4
20

1

You should have look at an ASCII table. `A-z` matches more than just letters, as well as `À-ú` – Toto Feb 11 '20 at 19:15

score 3 · Answer 17 · answered Aug 16 '20 at 16:56

3

/^[A-z]+$/.test('asd')
// true

/^[A-z]+$/.test('asd0')
// false

/^[A-z]+$/.test('0asd')
// false

answered Aug 16 '20 at 16:56

jarraga

392
4
7

Hello @jarraga. Welcome to SO, [did you read how to answer a question?](https://www.stackoverflow.com/help/how-to-answer). It should assist the clearance of your answer, and hence avoid down voting. – ndrwnaguib Aug 16 '20 at 22:50

Predrag Davidovic · Answer 18 · 2020-07-10T10:42:12.260

2

JavaScript

If you want to return matched letters:

('Example 123').match(/[A-Z]/gi) // Result: ["E", "x", "a", "m", "p", "l", "e"]

If you want to replace matched letters with stars ('*') for example:

('Example 123').replace(/[A-Z]/gi, '*') //Result: "****** 123"*

edited Jul 10 '20 at 10:42

answered Jul 10 '20 at 10:25

Predrag Davidovic

1,411
1
17
20

For letters beyond english: `/\p{Letter}/gu` ref: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Unicode_Property_Escapes#examples – jave.web Mar 17 '21 at 16:11

score 1 · Answer 19 · answered Jun 30 '14 at 05:36

1

pattern = /[a-zA-Z]/

puts "[a-zA-Z]: #{pattern.match("mine blossom")}" OK

puts "[a-zA-Z]: #{pattern.match("456")}"

puts "[a-zA-Z]: #{pattern.match("")}"

puts "[a-zA-Z]: #{pattern.match("#$%^&*")}"

puts "[a-zA-Z]: #{pattern.match("#$%^&*A")}" OK

answered Jun 30 '14 at 05:36

Snm Maurya

1,085
10
12

3

And what about for instance, “Zażółć gęslą jaźń”? – The Witness Apr 22 '18 at 19:25

Bersan · Answer 20 · 2023-08-17T08:55:42.787

1

The answers here either do not cover all possible letters, or are incomplete.

Complete regex to match ONLY unicode LETTERS, including those made up of multiple codepoints:

^(\p{L}\p{M}*)+$

(based on @ZoFreX comment)

Test it here: https://regex101.com/r/Mo5qdq/1

edited Aug 17 '23 at 08:55

answered Jul 28 '23 at 20:02

Bersan

1,032
1
17
28

score 0 · Answer 21 · answered Aug 16 '23 at 15:34

0

This one works for me, ONLY unicode characters (not valid for numbers, special characters, emojis ...)

// notice: unicode: true
RegExp(r"^[\p{L}\p{M} ]*$", unicode: true)

answered Aug 16 '23 at 15:34

Erfan Eghterafi

4,344
1
33
44

score -2 · Answer 22 · edited May 24 '16 at 03:48

-2

Pattern pattern = Pattern.compile("^[a-zA-Z]+$");

if (pattern.matcher("a").find()) {

   ...do something ......
}

edited May 24 '16 at 03:48

Alan Moore

73,866
12
100
156

answered May 23 '16 at 23:26

Fikreselam Elala

207
2
2

Regex to match only letters

22 Answers22

Linked

Related