17

I want to count characters, but they may be separated by characters that do not match.

Here is an example. I want to match a text that has 10 or more word-characters. It may include spaces but i don't want to count the spaces.

Should not match: "foo bar baz" (should count 9)
Should not match: "a          a" (should count 2)
Should match: "foo baz bars" (should count 10, match whole string)

This is what i came up with, but it counts the whole thing:

((?<=\s)*\w(?=\s)*){10}

Edit I do not want to include spaces for counting. Sorry I edited this a few times, I didn't describe it correctly.

Any ideas on this?

jomo
  • 14,121
  • 4
  • 29
  • 30
  • So do you want an array of `foo,baz,bars` when there's 10 or more word characters? What language are you using - PHP,JS,Perl..? – MDEV Aug 09 '13 at 10:49
  • You say you don't want to count the spaces, then that you "don't want to match them, just to count them". Please clarify. Give example input and desired match and output. – instanceof me Aug 09 '13 at 10:53
  • PHP: `var_dump(preg_match('#\w{10,}#',str_replace(' ','',$str)));` i.e. remove the spaces, then check – MDEV Aug 09 '13 at 10:56
  • Sorry my original question was a bit unclear. I edited it. I'm using JS and/or ruby – jomo Aug 09 '13 at 10:57
  • @HansWürstchen See my answer, I think you're after that – MDEV Aug 09 '13 at 10:59

3 Answers3

21

Hey I think this would a simple but working one:

( *?[0-9a-zA-Z] *?){10,}

Breaking the regex down:

  1. ( *? --------It can start with space(s)
  2. [0-9a-zA-Z] -Followed with the alphanumeric values
  3. *?) ---------It can end with space(s)
  4. {10,} -------Matches this pattern 10 or more times

Key: When I look at the count for regexes, it applies to the group, i.e., the things in the brackets "()", this case, multiple spaces followed ONE from the alphanumeric values followed by spaces are still counted as one match. Hope it helps. :)

Juto
  • 1,246
  • 1
  • 13
  • 24
  • 1
    It works! thank you! Sorry for the late answer. I think you could compact it like this: `(\s*?[\w]\s*?){10,}` – jomo Aug 10 '13 at 20:33
  • 1
    @HansWürstchen Agreed for replacing `[0-9a-zA-Z]` by `[\w]`. But for the `\s` to `" "`, I guess it will work on more things than it should depending on if you are reading from a file (you have multiple lines), or you get it from some input (one line), as \s will consider \n also as a letter, but it should not be the case. Please see on action: [a](http://regexr.com?35tq2) while as mine: [b](http://regexr.com?35tpv) – Juto Aug 12 '13 at 09:32
  • In the breakdown number three is missing a space at the beginning. It should look like ` *?)` and not `*?)`. Also, seems there is a bug in SO's backticks... – Arvo Bowen Jan 09 '20 at 15:09
7

Use a group that consumes spaces with each single word char, and count the groups:

^(\s*\w){10,}\s*$
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • "More than 10" => `{10,}` if you match `$`, or just simplify: `^(\s*\w){10}`. On a side note, the groups don't have to be capturing groups: `^(?:\s*\w){10}`. – instanceof me Aug 09 '13 at 10:49
  • @streetpc yeah just noticed that. I've added the comma. good point re just dropping the $. I guess it depends if he want to prevent junk at the end or not. – Bohemian Aug 09 '13 at 10:53
  • Sorry it was a bit unclear in my original question. I edited it now. I don't want to count spaces. So this string should not match "a" because it has only one character that matches. – jomo Aug 09 '13 at 10:58
  • Hans, this regex won't match lots of spaces then a letter. The input must have at least 10 letters somewhere in it. – Bohemian Aug 09 '13 at 16:31
3

Using JS: Remove the spaces, then do the \w check

'foo baz barz'.replace(/ /g,'').match(/\w{10,}/) != null //true
'foo bar baz'.replace(/ /g,'').match(/\w{10,}/) != null //false

Match phone numbers in text:

var test = 'something foo baz barz 07999-777-111 and 01234 567890 01234567890 some more'.match(/((\(?0\d{4}\)?[ -]?\d{3}[ -]?\d{3})|(\(?0\d{3}\)?[ -]?\d{3}[ -]?\d{4})|(\(?0\d{2}\)?[ -]?\d{4}[ -]?\d{4}))([ -]?\#(\d{4}|\d{3}))?/g);
//result: ["07999-777-111", "01234 567890", "01234567890"]
MDEV
  • 10,730
  • 2
  • 33
  • 49
  • This would probably do the trick. In general you would first remove the characters that you don't want to match and then do the counting match. – jomo Aug 09 '13 at 11:29
  • Hmm now that I think about it again, I am not sure how to select the original phrase. i.e. if I wanted to replace the original text with something – jomo Aug 09 '13 at 11:40
  • @HansWürstchen You mean replace the `foo baz barz` with something else? – MDEV Aug 09 '13 at 11:41
  • @HansWürstchen If you're testing the whole string for being more than 10 word characters, then it's just `if str matches the code above, then str = new_string` - as this isn't testing for a matching substring, but rather the whole string. So the whole string is replaced and the `match()` data isn't needed – MDEV Aug 09 '13 at 11:44
  • I used the example above for simplicity. It should also work when it matches only a part of a string. For example matching a phone number in a text, where the phone number may be separated by spaces and dashes but has to have at least 5 digits in it. – jomo Aug 09 '13 at 11:51
  • @HansWürstchen but if it's matching **10 or more** characters, it will end up matching the whole string if part of it matches – MDEV Aug 09 '13 at 11:55
  • @HansWürstchen In which case you want a much more specific regex, rather than a generic one like this - a standard phone regex is `/((\(?0\d{4}\)?\s?\d{3}\s?\d{3})|(\(?0\d{3}\)?\s?\d{3}\s?\d{4})|(\(?0\d{2}\)?\s?\d{4}\s?\d{4}))(\s?\#(\d{4}|\d{3}))?/` – MDEV Aug 09 '13 at 12:04
  • well, then forget the example and imagine what i said above. – jomo Aug 09 '13 at 12:06
  • my original question was how to match a string with a count of numbers, where some characters can be included, but don't count. Your answer works for the example I made, but i don't think it works in a general case. – jomo Aug 09 '13 at 12:13
  • @HansWürstchen It can't. If you're generally matching 10 or more, the "more" bit will extend and match the whole string. If you're finding phone numbers, that's a specific rule in itself. Some problems just don't have a "1 size fits all" solution. If you show us exactly what you've got and what you want to do with it (them), then it'll be easier, but with the lack of information at the moment, I've answered what I can – MDEV Aug 09 '13 at 12:17