0

I want to write a regex that matches a string that counts 10 words, excluding the white spaces. The following

aaa ttt aaa           a
bbbbb aazzz
a b c d e     f g    h i j    
abcdefghij 

should match the regex.

I also saw this answer, but it is with a certain string, not with a random one. What I tried so far

\s*[a-z]{10}$

gives me just those strings that ends with a string composed by 10 letters.

Can be this be achieved?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Mateut Alin
  • 1,173
  • 4
  • 17
  • 34
  • @WiktorStribiżew Updated the question, so if it's ok, please remove the duplicate tag. – Mateut Alin Jun 28 '19 at 08:38
  • 2
    Here's a minimal change to your current attempt, to make it "work": `^([a-z]\s+){1,10}$`. There are probably a hundred ways you could tweak this, depending on more specific requirements (e.g. I assumed you actually mean **up to 10** words, not **exactly 10 words**, and that a "word" consists only of lower case letters without punctuation, and that there would be no other punctuation in the text like `,` or `.` or `"`, ....) but hopefully that points you in the right direction. – Tom Lord Jun 28 '19 at 08:43
  • @TomLord Yes, a word consist only of lower case without punctuation. The boundaries in my use case are between 10 and 18, but for simplicity I asked for 10. However. I tried your regex and works, except for the case when it starts with space. So I tried `\s*^([a-z]\s+){1,10}$`, but isn't working. Btw, you can write your comment as an answer. It's pretty closed for what I need – Mateut Alin Jun 28 '19 at 08:52
  • @TomLord I found the solution. `^(\s*[a-z]\s*){1,10}$`. Thanks! – Mateut Alin Jun 28 '19 at 09:01
  • Sorry, got busy with other things. I see you come to some solution, but it is really inefficient. Use `^\s*([a-z]\s*){1,10}$` – Wiktor Stribiżew Jun 28 '19 at 09:30

1 Answers1

1

You may use

^\s*([a-z]\s*){1,10}$

Or with a non-capturing group:

^\s*(?:[a-z]\s*){1,10}$

See the regex demo and the regex graph:

enter image description here

Details

  • ^ - start of a string
  • \s* - 0+ whitespaces
  • (?:[a-z]\s*){1,10} - one to ten repetitions of:
    • [a-z] - a lowercase ASCII letter
    • \s* - 0+ whitespaces
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks! Can you tell me why my solution is inefficient? – Mateut Alin Jun 28 '19 at 11:02
  • 1
    @MTZ The `\s` on both sides of the group is optional and the group is quantified. This adds to the backtracking while the regex engine works. Avoid such patterns where the quantified group contains just 1 obligatory pattern and the rest can match an empty string and you will avoid catastrophic backtracking. – Wiktor Stribiżew Jun 28 '19 at 11:04