6

So I'm making a program to parse twitch chat, and I'm wondering if there's a way I can use regex to parse the following into the desired result:

"f o o b a r" into "foobar"

So far, the code I have is /(?:(\w)\s){3,}/g and this works to an extent, but consider the following situation:

"FrankerZ R I O T FrankerZ" captures "T" (the last letter in "R I O T") and selects "Z R I O T"

What I would want for this is to figure out how to detect if there is a single letter with a space before and after it, and if there are at minimum 3 of those in a row (so "test a b test" isn't selected as ab, only captures if there are 3+)

Any help? Thanks!

Izzy
  • 272
  • 1
  • 14

7 Answers7

3

Try this pattern: /(?:\b\w(?:\s|$)){3,}/g

This uses the word boundary metacharacter \b so you get a proper whole word match instead of the partial match you saw with FrankerZ. Also, the \s|$ bit addresses the last letter being lost when no space comes after it, e.g., the "T" in R I O T.

Example:

var inputs = [
  "R I",
  "R I O T",
  "FrankerZ R I O T FrankerZ",
  "f o o b a r"
];

var re = /(?:\b\w(?:\s|$)){3,}/g;

inputs.forEach(function(s) {
  var match = s.match(re);
  if (match) {
    var result = match[0].replace(/\s/g, '');
    console.log('Original: ' + s);
    console.log('Result: ' + result);
  } else {
    console.log('No match: ' + s);
  }
});

Demo: JSBin

EDIT: updated to cover 3+ single letters and example of no match.

Ahmad Mageed
  • 94,561
  • 19
  • 163
  • 174
  • Yes this worked, but still applied to "R I" and "R I O" when I would only want it to apply to 3 or more single-letter words. I posted my answer which does something similar to this but works with my scenario. Thanks! – Izzy Jul 29 '15 at 01:26
  • 1
    Instead of the `+` quantifier, use `{3,}`. – Purag Jul 29 '15 at 01:32
  • @Flipybitz easily fixed by using `{3,}` instead of `+`. – Ahmad Mageed Jul 29 '15 at 01:32
  • @Purag yep, that's pretty much what I did in my solution but Ahmad beat me to it :P – Izzy Jul 29 '15 at 01:57
1

Thank you to Sam Burns for suggesting the use of \b. What works for me was:

/\b((?:\w ?\b){3,})/g

This would select the following:

H Y P E from FrankerZ H Y P E FrankerZ, and f o o b a r (doesn't end or begin with a space character, was giving me issues as well)

Specifying the literal space " " character instead of \s was also important for avoiding line breaks and other instances when I only wanted to check for just the space character in the first place.

For replacing it without spaces, I'll simply do .replace(" ","") to receive the exact result I wanted. Thanks again for everyone's help :)

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Izzy
  • 272
  • 1
  • 14
1

Here is a good reference how to replace with matches Javascript replace with reference to matched group?

So you could do:

'string'.replace(/(\s|^)((?:\w\s){2,}\w)(\s|$)/g, function(a, b, c, d) {
     return b + c.replace(/\s/g, '') + d;
});

See demo

Community
  • 1
  • 1
maraca
  • 8,468
  • 3
  • 23
  • 45
0

You're going to have trouble solving this entire problem with regular expressions alone.

That is to say, there is no regular expression that will do all of the following:

  • select nothing you're not interested in
  • capture everything you're interested in
  • capture a variable number of matches

The last requirement -- a variable number of captures -- is the big one. StackOverflow user Tomalak described the situation quite well:

Groups are defined thorugh parentheses. Your match result will contain as many groups as there are parentheses pairs in your regex (except modified parentheses like (?:...) which will not count towards match groups). Want two separate group matches in your match result? Define two separate groups in your regex.

If a group can match multiple times, the group's value will be whatever it matched last. All previous match occurrences for that group will be overridden by its last match.

You can still let a regular expression do much of the work, though, for example using the \b boundary-of-word anchor. This is much like what you were describing as "a space before and after it" but is closer to what you want because it doesn't match (or even require) the space itself.

> "R I O T".match(/\b\w\b/g)
["R", "I", "O", "T"]
> "FrankerZ R FrankerZ I FrankerZ O FrankerZ T".match(/\b\w\b/g)
["R", "I", "O", "T"]

You wanted quantification, and of course this regex contains no quantifiers:

> "test a b test".match(/\b\w\b/g)
["a", "b"]

But you can do this outside of the regular expression:

var individual_letters_re = /\b\w\b/g;

function hiddenWord(sentence) {
    letters = sentence.match(individual_letters_re);
    if (letters && letters.length >= 3) {
        return letters.join("");
    }
    return "";
}

> hiddenWord("R I O T")
"RIOT"
> hiddenWord("FrankerZ R FrankerZ I FrankerZ O FrankerZ T")
"RIOT"
> hiddenWord("test a b test")
""
> hiddenWord("test a b c test")
"abc"
Community
  • 1
  • 1
RJHunter
  • 2,829
  • 3
  • 25
  • 30
0

Try this on your terminal/browser/console:

var text = "FrankerZ R I O T FrankerZ";
var new_text = text.replace(/(\s\S(?=\s)){3,}/g, function(w){
    return(' ' + w.replace(/\s/g, ''));
});
console.log(new_text);

Hope it do the needs.

-1

Rather than using a regex, you could make a function that takes a string, splits the string at a space then returns all the single letters

    function findSingleLetters(string){
        var split = string.split(" ");
        var word= [];
        for(int i=0;i<split.length; i++){
            if(split[i].length==1){
               word.push(split[i]);
            }
        }
        return word.toString().replace(/,/g,"");  //join the word array and replace all the remaining commas(,)
    }
Pindo
  • 1,585
  • 5
  • 16
  • 32
  • Wouldn't this just give me all single letter words? What if someone says `"this is a test, R I O T"`, it'd give you `"a,r,i,o,t"`, which is why I only want it to start capturing when there are 3+ instances of single letters in a row. Any idea how I'd include that in the code you've written? – Izzy Jul 29 '15 at 01:08
-1

\b is a zero-width assertion that matches the gap between a word charter and a non-word character. For example, /\b\w\s/ matches the R in rZ R I, but not the Z: the Z does not follow a 'word break', or a switch between word and non-word characters. Try putting this at the start of your regex, to show you don't want it to start matching in the middle of a word.

Sam Burns
  • 480
  • 3
  • 13
  • Thank you! This was exactly what I needed, I would +rep but don't quite have enough reputation myself in order to. I'll post my answer to this. Thanks again :) – Izzy Jul 29 '15 at 01:19