111

I would like to know the regex to match words such that the words have a maximum length. for eg, if a word is of maximum 10 characters in length, I would like the regex to match, but if the length exceeds 10, then the regex should not match.

I tried

^(\w{10})$

but that brings me matches only if the minimum length of the word is 10 characters. If the word is more than 10 characters, it still matches, but matches only first 10 characters.

Alfabravo
  • 7,493
  • 6
  • 46
  • 82
Anand Hemmige
  • 3,593
  • 6
  • 21
  • 31
  • Is there a reason why you don't want to simply iterate over words and use `String.length()`? – MAK Jan 28 '12 at 08:02
  • 1
    Yes. This string is part of a bigger string that contains words of several formats - dates, emails, urls etc all in a tab delimited format. I am thinking to write a composite regex to match the whole line. – Anand Hemmige Jan 28 '12 at 08:09
  • I see. Since the words are delimited by tabs, isn't it possible to split them (using `String.split()` or `StringTokenizer`) and then look at each word length? – MAK Jan 28 '12 at 08:12
  • very much possible. in fact that was my thought at first but using a regex seemed straight forward then.. :) – Anand Hemmige Jan 28 '12 at 08:22

7 Answers7

115

I think you want \b\w{1,10}\b. The \b matches a word boundary.

Of course, you could also replace the \b and do ^\w{1,10}$. This will match a word of at most 10 characters as long as its the only contents of the string. I think this is what you were doing before.

Since it's Java, you'll actually have to escape the backslashes: "\\b\\w{1,10}\\b". You probably knew this already, but it's gotten me before.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Tikhon Jelvis
  • 67,485
  • 18
  • 177
  • 214
  • Thanks. Im sure the escape have gotten me before as well.. :0 The expression you provided matches the 10 characters if the word is larger than 10. I do not want it to match if the word exceeds 10 characters. Sort of opposite of \w{10,} you could say... ! – Anand Hemmige Jan 28 '12 at 08:15
  • 1
    @AnandHemmige: Which expression? The one with a `\b` should not match anything if there are more than 10 characters in the word. The same is true for the one ending in `$`. You should try the latter if the string is just one word. – Tikhon Jelvis Jan 28 '12 at 08:58
  • 1
    In my VI version (gvim for Windows) I need a backslash (\\) before `{` for this to work. – Krisztián Balla Dec 07 '15 at 09:00
  • What is the regex to find a word that is not certain characters length? – AATHITH RAJENDRAN May 09 '21 at 08:10
66
^\w{0,10}$ # allows words of up to 10 characters.
^\w{5,}$   # allows words of more than 4 characters.
^\w{5,10}$ # allows words of between 5 and 10 characters.
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
32

Length of characters to be matched.

{n,m}  n <= length <= m
{n}    length == n
{n,}   length >= n

And by default, the engine is greedy to match this pattern. For example, if the input is 123456789, \d{2,5} will match 12345 which is with length 5.

If you want the engine returns when length of 2 matched, use \d{2,5}?

Kleenestar
  • 769
  • 4
  • 4
7

Method 1

Word boundaries would work perfectly here, such as with:

\b\w{3,8}\b
\b\w{2,}
\b\w{,10}\b
\b\w{5}\b

RegEx Demo 1

Java

Some languages such as Java and C++ are double-escape required:

\\b\\w{3,8}\\b
\\b\\w{2,}
\\b\\w{,10}\\b
\\b\\w{5}\\b

PS: \\b\\w{,10}\\b may not work for all languages or flavors.

Test 1

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class RegularExpression{

    public static void main(String[] args){


        final String regex = "\\b\\w{3,8}\\b";
        final String string = "words with length three to eight";

        final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
        final Matcher matcher = pattern.matcher(string);

        while (matcher.find()) {
            System.out.println("Full match: " + matcher.group(0));
        }

    }
}

Output 1

Full match: words
Full match: with
Full match: length
Full match: three
Full match: eight

Method 2

Another good-to-know method is to use negative lookarounds:

(?<!\w)\w{3,8}(?!\w)
(?<!\w)\w{2,}
(?<!\w)\w{,10}(?!\w)
(?<!\w)\w{5}(?!\w)

Java

(?<!\\w)\\w{3,8}(?!\\w)
(?<!\\w)\\w{2,}
(?<!\\w)\\w{,10}(?!\\w)
(?<!\\w)\\w{5}(?!\\w)

RegEx Demo 2

Test 2

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class RegularExpression{

    public static void main(String[] args){


        final String regex = "(?<!\\w)\\w{1,10}(?!\\w)";
        final String string = "words with length three to eight";

        final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
        final Matcher matcher = pattern.matcher(string);

        while (matcher.find()) {
            System.out.println("Full match: " + matcher.group(0));
        }

    }
}

Output 2

Full match: words
Full match: with
Full match: length
Full match: three
Full match: to
Full match: eight

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here


If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Emma
  • 27,428
  • 11
  • 44
  • 69
6

Even, I was looking for the same regex but I wanted to include the all special character and blank spaces too. So here is the regex for that:

^[A-Za-z0-9\s$&+,:;=?@#|'<>.^*()%!-]{0,10}$
Pardeep
  • 945
  • 10
  • 18
1

Simple, complete and tested java code, for finding words of certain length n:

int n = 10;
String regex = "\\b\\w{" + n + "}\\b";
String str = "Hello, this is a test 1234567890";
ArrayList<String> words = new ArrayList<>();
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
  words.add(matcher.group(0));
}
System.out.println(words);

For more explanations and different options - see other answers.

Ronen Rabinovici
  • 8,680
  • 5
  • 34
  • 46
0

Liked Pardeep's answer but I needed whole word bounds in a string/title that can be any messed up string an advertising dept. can think up .

**\b\w(**[A-Za-z0-9\s$&+,:;=?@#|'<>.^*()%!-]{1,22}**)\b**

should iterate through a string ( tested notepad++ ) and get the largest group of words in the range i.e. 1,22 chars here without splitting mid word.

Here was the final command for me in python to add some LF's

name = re.sub(r"\b(\w[A-Za-z0-9\s$&+,:;=?@#|'<>.^*()%!-]{1,22})\b","\\\1\\\n",name)
Procrastinator
  • 2,526
  • 30
  • 27
  • 36
mxdog
  • 45
  • 5
  • Are you providing a solution to the question at the top of this page? Or are you describing how you solved a different one? – Yunnosch Apr 26 '22 at 21:56
  • for the question ( another way ) and it is just a variation on the answer above it ... perhaps read through the whole page and see how we got to here ? – mxdog Apr 27 '22 at 22:09
  • "the answer above it" There is no "above" here, because the display order is individually configurable and unpredictable. I get the impression that you confuse StackOverflow wiht a forum which has threads... But I only wanted to make sure that you intend this to be an answer to the only question on this page. Thanks. – Yunnosch Apr 28 '22 at 05:46