5

I've been trying to solve this problem for some time, but I've been unable to. I have to make a form and I must validate the name input to have at least 3 words in it within JavaScript. I think the best way to do it is with Regex and its \b property.

This is

    <input type="text" class="texto" name="Nombre" id="name" title="nombre_cliente" style="color:#888;" placeholder="Nombre del cliente" />

What I mean to do in my JavaScript code is this:

        if(document.getElementById("name").value.match(RegExCodeForMin3Words) === null){
        alert("Name invalid");
    }

So far I've been unable to learn how to make regex match the amount of words (I'm still a beginner at Regex). Can you help me tackle this problem? Maybe Regex isn't the best option available to solve this?

Thanks!

Riccardo
  • 383
  • 5
  • 16
  • possible duplicate of [How can check a minimum 3 characters in a given value ,using regular expression](http://stackoverflow.com/questions/4630908/how-can-check-a-minimum-3-characters-in-a-given-value-using-regular-expression) – anpsmn Apr 13 '15 at 05:54
  • No, it's not a dupe, at least not of that. Counting words ≠ counting characters. – Touffy Apr 13 '15 at 06:50

4 Answers4

5

Regex to match the string which contains atleast three words.

\S+\s+\S+\s+\S+

\S+ matches one or more non-space characters. If you mean word as any combination of non-space characters then you could use the above regex.

OR

\b\w+\b(?:.*?\b\w+\b){2}

DEMO

> /\b\w+\b(?:.*?\b\w+\b){2}/.test('foo bar buz')
true
> /\b\w+\b(?:.*?\b\w+\b){2}/.test('foo bar')
false
> /\b\w+\b(?:.*?\b\w+\b){2}/.test('foo bar bux foobar')
true

\w+ matches one or more word character. So this forms a single complete word. (?:.*?\b\w+\b){2} ensures that there must have another two words following the first word. {2} quantifier repeats the previous token exactly two times.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • I'm afraid that won't work so well with the `\b` (check my answer and/or try it with actual data with diacritics…). – Touffy Apr 13 '15 at 06:08
  • What second option? all your examples use `\b` except the one at the top with `\S`s. It's going to count "Éléonore" as two words. – Touffy Apr 13 '15 at 06:14
  • How my first option will count `Éléonore` as two words? Explain me what's wrong with these two alternate answers i provided. I don't know what exactly op means by word, so i added both options. – Avinash Raj Apr 13 '15 at 06:15
  • Try it. That's because, in JavaScript regex implementations, `\b` only treats ascii letters as in-word characters (whereas `\w` accepts many, though not all, accented letters). – Touffy Apr 13 '15 at 06:19
  • i added `\b` alternative only because op mentioned this ` I think the best way to do it is with Regex and its \b property.`. @Touffy ya i know what are you trying to say, i already told you that i added an alternative option. – Avinash Raj Apr 13 '15 at 06:22
  • Ah, you mean your *first* option was the one without `\b`. OK. Yeah, it has other issues (hyphens, apostrophes…) but I suppose it's still better. – Touffy Apr 13 '15 at 06:27
4

Don't use a regex to look for words, it's not necessary. Just split on whitespace.

var wordCount = document.getElementById("name").trim().split(/\s+/).length;
if( wordCount < 3 ) { ... }

Call trim() first so there is no leading or trailing whitespace that will get erroneously split. Then split it on \s+ which is the character group whitespace 1 or more times. split returns an array of all groups separated by the delimiter, which in this case is whitespace. The elements of the array will be all "words", or whatever is in the input separated by spaces.

Andy Ray
  • 30,372
  • 14
  • 101
  • 138
  • 2
    And… how is that not using a regex? Besides, (true) regular expressions are very efficient, the reason not to use them is that there are things they can't represent well. – Touffy Apr 13 '15 at 06:16
  • "A regex to look for words" is clear wording. Regexes can be very useful, but can also be hard to read. This solution is simple and easy to read, instead of messing with optional capture groups and word boundaries, and very easy to modify for `n` words. "Efficiency" is irrelevant for this use case. – Andy Ray Apr 13 '15 at 07:22
  • 1) and yet you are using a regex as the delimiter in `split`. 2) the `{n,}` modifier makes it quite easy to change the number of words. 3) `(?:)` doesn't mean optional capture, it means no capture at all; ugly, I'll admit. 4) if you're aiming for simplicity, `` is hard to beat. – Touffy Apr 13 '15 at 09:28
3

Disclaimer: there is no 100% accurate method for tokenization (splitting words) in many languages.

You can't use \b because, unfortunately, it matches the "break" around most letters with diacritics (e.g. "é").

A simple approximation for romance languages is to look for spaces and apostrophes.

/.+?(?:[\s'].+?){2,}/

Explanation:

  • [\s'] matches a whitespace character or an apostrophe. It can be improved as much as you want (could include punctuation etc), but the idea is that it's "stuff between words". This part is what determines the quality of the tokenizer.
  • .+? matches any non-empty string that can't be matched by anything else. It doesn't say anything about what constitutes a word.
  • (?:[\s'].+?) is just a sequence of a delimiter and a "string between delimiters" (a word, we hope). The ?: in the beginning prevents the engine from capturing the group in parentheses, but it's not really necessary. We want the parentheses to apply a quantifier to the whole sequence.
  • The final regex, .+?(?:[\s'].+?){2,} means "a word, then 2 or more times the sequence of a delimiter + a word" (total 2+1=3 words minimum).

Furthermore, instead of using JavaScript, you can declaratively validate your text field with the pattern attribute:

<input type="text" name="Nombre" … required pattern=".+?(?:[\s'].+?){2,}">
Touffy
  • 6,309
  • 22
  • 28
-7

I guess you can use the length method for this variable in javascript.

for example: var s="123456"; you can just get the length by s.length

Chris Bai
  • 1
  • 1