17

I'm new to regular expressions and I'm trying to come up with a regex which matches a word which contains at least one letter and zero or more digits.

Example:

users - match

u343sers - match

13123123 - not match

I cannot use lookarounds because I need to use the regex in Go and the regexp package doesn't support them.

Is this possible without lookarounds?

AnduA
  • 610
  • 1
  • 6
  • 10

3 Answers3

32

Regexp is built for exactly this, no need for look arounds.

The Regexp

\w*[a-zA-Z]\w*

Explanation:

  • \w: Any letter or number, *: 0 or more times
  • [a-zA-Z]: Any letter, A-Z, caps A-Z or lowercase a-z
  • \w: Any letter or number, *: 0 or more times

Regexper:

Regexper

Online Demo:

regexr.com

Ben Aubin
  • 5,542
  • 2
  • 34
  • 54
  • Be careful with `[A-z]`, it matches more than letters. – Wiktor Stribiżew Nov 27 '15 at 23:25
  • 4
    and `\w` matches more than letters and numbers (matches `_` too). but thats just me being picky – R Nar Nov 27 '15 at 23:29
  • 1
    Ben, thank you . I had a quirky regex requirement and your regex did exactly what I needed it to do. – Chris Mendla Sep 26 '17 at 05:12
  • It fails for two things: **(1)** This will also match `_a_` or `_A_` or `_____A` or `A________` which violates the requirement. The culprit is `\w` which not only matches `0-9` but also an underscore (`_`) **(2)** It will fail for letters in other locales e.g. Greek letters, German letters etc. Please check the [answer](https://stackoverflow.com/a/60261904/10819573) addressing these issue. – Arvind Kumar Avinash Aug 19 '20 at 22:40
  • @ArvindKumarAvinash You are correct in that, if underscores should not be matched, using the more explicit `[a-zA-Z0-9]` (as in @R Nar's answer) is more precise than this answer. However, using `\w` _may_ be a better fit for some cases. – Ben Aubin Aug 20 '20 at 04:18
  • 1
    Unicode is a mess to deal with, and there's no one-size-fits-all Regexp syntax for non-ascii letters. Additionally, it can be extremely difficult to even categorize what Unicode characters count as "letters" in languages without alphabets (such as Chinese). If you want to match letters from other languages, you'll need to find the specific syntax for your programming language and carefully consider your problem domain – Ben Aubin Aug 20 '20 at 04:26
11

completely able to do without lookarounds, just split things into separate entities and explicitly match exactly one letter:

[a-zA-Z0-9]*[a-zA-Z][a-zA-Z0-9]*

Regular expression visualization

Debuggex Demo

R Nar
  • 5,465
  • 1
  • 16
  • 32
  • 1
    I don't understand the downvote either. This answer is (still) more correct than the other one. – Alan Moore Nov 28 '15 at 00:31
  • It fails for letters in other locales e.g. Greek letters, German letters etc. Please check the [answer](https://stackoverflow.com/a/60261904/10819573) addressing this issue. – Arvind Kumar Avinash Aug 19 '20 at 22:40
3

If you are using Java, the regex for your requirement is [\\p{L}0-9]*\\p{L}[\\p{L}0-9]*

Explanation:

  1. \p{L} matches any single letter (e.g. A-Za-z, a letter from Greek, German etc. locales)
  2. [\\p{L}0-9]* matches any number of letters or digits because of the quantifier * applied on the character classes consisting of letter and digits.
  3. Thus, the pattern [\\p{L}0-9]*\\p{L}[\\p{L}0-9]* means Any number of letters or digits + A single letter + Any number of letters or digits

Check java.util.regex.Pattern to learn more about these patterns.

Demo:

public class Main {
    public static void main(String[] args) {
        String[] testStrings = { "A1ö1", "_a_", "1Ωω2", "123", "1", "a", "abc", "ABC", "aBc123", "123abc", "123abc123",
                "aBc123aBc", "_123", "123_", "123_123", "1_a", "_", "a_", "a_1.", "123.a", "12.56" };
        for (String s : testStrings) {
            System.out.println(s.matches("[\\p{L}0-9]*\\p{L}[\\p{L}0-9]*") ? s + " matches" : s + " does not match");
        }
    }
}

Output:

A1ö1 matches
_a_ does not match
1Ωω2 matches
123 does not match
1 does not match
a matches
abc matches
ABC matches
aBc123 matches
123abc matches
123abc123 matches
aBc123aBc matches
_123 does not match
123_ does not match
123_123 does not match
1_a does not match
_ does not match
a_ does not match
a_1. does not match
123.a does not match
12.56 does not match
Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110