2

I'm having troubles with REGEX trying to build one that would retrieve the first letter of a word and any other Capital letter of that word and each first letter including the any Capital letter in the same word

"WelcomeBack to NorthAmerica a great place to be" = WBTNAAGPTB
"WelcomeBackAgain to NorthAmerica it's nice here" = WBATNAINH
"Welcome to the NFL, RedSkins-Dolphins play today" = WTTNFLRSDPT

tried this juus to get the first 2 matches:

/([A-Z])|\b([a-zA-Z])/g

Any help is welcomed, thanks

GV3
  • 491
  • 3
  • 16

4 Answers4

3

You need a regex that will match all uppercase letters and those lowercase letters that appear at the start of the string or after a whitespace:

var re = /[A-Z]+|(?:^|\s)([a-z])/g; 
var strs = ["WelcomeBack to NorthAmerica a great place to be", "WelcomeBackAgain to NorthAmerica it's nice here", "Welcome to the NFL, RedSkins-Dolphins play today"];
for (var s of strs) {
  var res = "";
  while((m = re.exec(s)) !== null) {
    if (m[1]) {
       res += m[1].toUpperCase();
    } else {
      res += m[0];
    }
  }
  console.log(res);
}

Here, [A-Z]+|(^|\s)([a-z]) matches multiple occurrences of:

  • [A-Z]+ - 1 or more uppercase ASCII letters
  • | - or
  • (?:^|\s) - start of string (^) or a whitespace (\s)
  • ([a-z]) - Group 1: one lowercase ASCII letter.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    The code could be simplified a tiny bit by also capturing the uppercase characters, which means that the `if` statement can be removed as the the loop will simply collect all captured content into a string (`m.slice(1).find(x=>x)`), followed by uppercasing the whole result. This will cut 4 lines out of the `while` loop and thus the `for` loop will be almost half the size. – VLAZ Sep 20 '16 at 10:37
  • 1
    Thanks, @wiktor, it works like a charm. And great explanation! – GV3 Sep 20 '16 at 11:43
1

Try this:

let s = "WelcomeBack to NorthAmerica a great place to be";
s = s.match(/([A-Z])|(^|\s)(\w)/g);    // -> ["W","B"," t", " N"...]
s = s.join('');                        // -> 'WB t N...'
s = s.replace(/\s/g, '');              // -> 'WBtN...'
return s.toUpperCase();                // -> 'WBT ...'

/(?:([A-Z])|\b(\w))/g matches every uppercase letter ([A-Z]) OR | every letter (\w) that follows the start of the string ^ or a whitespace \s.

(I couldn't get the whitespace to not be captured for some reason, hence the replace step. Surely there are better tricks, but this is the most readable I find.)

HumanCatfood
  • 960
  • 1
  • 7
  • 20
  • If that were that easy, the whole code could be a one-liner. I already tried this approach before posting my answer. The second sample string should return `WBATNAINH`, and [your code returns `WBATNAISNH`](https://jsfiddle.net/x5vga5s1/). – Wiktor Stribiżew Sep 20 '16 at 10:12
  • You don't need to wrap the capturing groups in another group to make the OR work. It works the same without it. – VLAZ Sep 20 '16 at 10:14
  • 2
    *I couldn't get the whitespace to not be captured for some reason* - the reason is that JS regex engine does not support lookbehinds :( – Wiktor Stribiżew Sep 20 '16 at 10:39
  • How weird.. I mean you can NOT capture other stuff, right? Why does whitespace require lookbehinds? – HumanCatfood Sep 20 '16 at 10:42
  • 2
    @HumanCatfood it's not "whitespace", really. You can very well not capture stuff using `(?:)` however a `regex.exect` will return _everything_ matched at index zero. This includes non-capturing groups. Captured groups are then placed in the rest of the indeces in the returned array. The only way to _exclude_ stuff from being matched is to use a zero-length match - those will not be matched and hence not returned at index zero. A lookbehind is a zero-length match, however, JS does not support it. You can emulate it by reversing the string and the regex, then using a lookahead. – VLAZ Sep 20 '16 at 10:50
1

You can use regex as : /\b[a-z]|[A-Z]+/g;

<html>
   <head>
      <title>JavaScript String match() Method</title>
   </head>
   <body>
      <script type="text/javascript">
         var str = "WelcomeBack to NorthAmerica a great place to be";
         var re = /\b[a-z]|[A-Z]+/g;
         var found = str.match( re );
         found.forEach(function(item, index) {
            found[index] = item.toUpperCase();
        });
          document.write(found.join('')); 
      </script>
      
   </body>
</html>
Shekhar Khairnar
  • 2,643
  • 3
  • 26
  • 44
  • 1
    Try with `WelcomeBackAgain to NorthAmerica it's nice here` and - after some editing - you will get to a solution closer to mine or HumanCatford's. – Wiktor Stribiżew Sep 20 '16 at 10:42
1

You can try this, it will also take care of whitespaces

str = str.match(/([A-Z])|(^|\s)(\w)/g);
str = str.join('');
str=str.replace(/ /g,'');
return str.toUpperCase();
Mustofa Rizwan
  • 10,215
  • 2
  • 28
  • 43