I am trying to create a regex in ruby that matches against strings with 10 characters which are not special characters i.e. would match with \w
.
So far I have come up with this:
/\w{10,}/
but the issue is that it will only count a consecutive sequence of word characters. I want to match any string which counts up to have at least 10 "word" characters. Is this possible? I am fairly new to regex as a whole so any help would be appreciated.

- 363
- 3
- 11
-
1Can you include some example strings and the part(s) that you want / don't want to match? – Stefan May 17 '21 at 16:31
-
Your question is not clear. You are given a string `str`. Do you merely wish to determine if `str` contains at least 10 word characters? If so, `str.scan(/\w/).size > 10` would suffice. If you wish to extract all strings that contain 10 or more word characters you need to clarify whether `"12345678901234567890"` contains one such string, two strings (`"1234567890"` and `"1234567890"`) or 11 (possibly overlapping) strings (`"1234567890"`, `"2345678901"`, etc.). Please edit to clarify your question. – Cary Swoveland May 17 '21 at 18:38
2 Answers
If I understood correctly, this should work:
/(?:\w[^\w]*){9,}\w/
Explanation:
We start with a single
\w
We want to capture all the other characters until another \w
, hence:
\w[^\w]*
[^<list of chars>]
matches any character other than listed in the brackets, so [^\w]
means any character that is not a word character. *
denotes 0 or more. The above will match "a-- "
, "b"
and "c!"
in "a-- bc!"
string.
Since we need 10 \w, we will match 9 (or more) groups like that, followed by a single \w
(\w[^\w]*){9,}\w
We don't really care for captures here (especially since ruby will ignore repeated group captures anyway, so we make the group non-capturing)
(?:\w[^\w]*){9,}\w
Alternatively we could just use simpler regex:
(?:\w[^\w]*){10,}
But it will also cover characters after the last word character in a string - not sure if this is required here.

- 44,031
- 8
- 61
- 86
-
Thanks, it seems to do what I want. Could you briefly explain how this works though? Struggling to understand it. – perrywinkle May 17 '21 at 15:57
-
Thanks for the explanation, I understand the logic now. One last thing: what if we wanted to put a maximum number on this? For example, if we wanted to put a limit of 20 "word" characters, would we have to just put {10,20} ? – perrywinkle May 17 '21 at 16:02
-
1It depends on how you want to use that limit. If this is for validation, you'd need to additionally wrap it between `\A` and `\z`. If you use it for a scanning, just adding limit to a range would work. – BroiSatse May 17 '21 at 16:06
-
Can you not use `\W` in place of `[^\w]`? btw, that pianist looks quite a bit younger than you. – Cary Swoveland May 17 '21 at 17:15
-
@CarySwoveland - Good point. I always have doubts whether \W is a simple negation of \w or not! But it seems it is, I'll update the answer. :) And yes, that photo was taken almost 10 years ago now. I think it is time to update... – BroiSatse May 18 '21 at 11:09
Match anywhere in the string:
/\w(?:\W*\w){9,19}/
/(?:\W*\w){10,20}/
Validate a string of 10 to 20 characters long:
/\A(?:\W*\w){10,20}\W*\z/
Prefer non-capturing groups, particularly when extracting found matches.
Watch out for ^
and $
that mark up start and end of the line respectively in Ruby's regex.
EXPLANATION
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (between 10 and
20 times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\W* non-word characters (all but a-z, A-Z, 0-
9, _) (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
){10,20} end of grouping
--------------------------------------------------------------------------------
\W* non-word characters (all but a-z, A-Z, 0-
9, _) (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\z the end of the string

- 18,032
- 4
- 24
- 37