1
var str = "4/16/14, 10:24 AM - John Doe: How is everything going on? Check this: iPhone7!"

I want to check if string contains a substring that matches AM - <some-name>:. For eg, in above string it should match AM - John Doe: and return John Doe. (of course once it matches, I can get the name using substring). Also, Sometimes there maybe special characters instead of white spaces in AM - John Doe:. Regular expression should work in this case also.

eg:

var str1 = "4/16/14, 10:24 AM - John Doe likes your photo";
var str2 = "4/16/14, 10:24 AM John Doe replied to your comment";
var str3 = "4/16/14, 10:24 AM John Doe: Whats going on";
var str4 = "4/16/14, 10:24 AM John Doe: Whats going on : hmmm";

The regular expression should match str3 and str4 since it contains a sub-string that begins with AM and ends with the first :

For both str3 and str4, I want to get the name John Doe. Note: str1 and str2 has John Doe too but there it does not immediately trail by :

Expressions I have tried:

str.match(/[AP]M - \w+[ ]?\w+[ ]?\w+:./);

Above fails when there are special characters such as UTF-8 characters. It is not visible but there seems to be characters such as e2 80 80.

senchaDev
  • 587
  • 2
  • 6
  • 19
  • 1
    What kind of special characters can there be? – Wiktor Stribiżew Sep 27 '16 at 06:50
  • Please provide examples of valid matches and invalid strings. – Makyen Sep 27 '16 at 06:51
  • See this question: [How do I retrieve all matches for a regular expression in JavaScript?](http://stackoverflow.com/questions/6323417/how-do-i-retrieve-all-matches-for-a-regular-expression-in-javascript) – Anderson Green Sep 27 '16 at 06:51
  • If you use a negated character class it will match any chars but the one inside the class. I guess it fits your case. – Wiktor Stribiżew Sep 27 '16 at 06:55
  • The symbol you mentioned in the question is `U+2000, EN QUAD`. Acc. to MDN, `\s` matches [*`[ \f\n\r\t\v​\u00a0\u1680​\u180e\u2000​-\u200a​\u2028\u2029\u202f\u205f​\u3000\ufeff]`*](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp). I think `/\b[AP]M\s+(?:-\s+)?([^:]+):/` should work for you. If not, try the one I currently have in the answer. – Wiktor Stribiżew Sep 27 '16 at 07:54
  • I would also ask for you to provide either a list of valid and invalid chars concerning the name. – Vladimir Drenovski Sep 27 '16 at 08:21

2 Answers2

1

You may use /\b[AP]M\W+(?:-\W+)?([^:]+):/

var str1 = "4/16/14, 10:24 AM - John Doe likes your photo";
var str2 = "4/16/14, 10:24 AM John Doe replied to your comment";
var str3 = "4/16/14, 10:24 AM John Doe: Whats going on";
var str4 = "4/16/14, 10:24 AM John Doe: Whats going on : hmmm";
var ss = [ str1, str2, str3, str4 ]; // Test strings
var rx = /\b[AP]M\W+(?:-\W+)?([^:]+):/; 
for (var s = 0; s < ss.length; s++) {                  // Demo
  document.body.innerHTML += "Testing \"<i>" + ss[s] + "</i>\"... ";
  document.body.innerHTML += "Matched: <b>" + ((m = ss[s].match(rx)) ? m[1] : "NONE") + "</b><br/>";
}

Pattern details:

  • \b - a word boundary
  • [AP]M - AM or PM
  • \W+ - 1+ non-word chars
  • (?:-\W+)? - optional sequence of a hyphen and a non-word char
  • ([^:]+) - Group 1 (our output) capturing one or more chars other than :
  • : - a colon.

Since [^...] is a negated character class it will match any characters up to the first : (excluding that : from the match), but the trailing : in the pattern will actually require the presence of : in the string.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Awesome. I have been trying for hours. This works! Thanks a ton. – senchaDev Sep 27 '16 at 07:07
  • Actually nope. It failed for str1. It matched str1 which it should not have as str1 does not have any trailing ":" – senchaDev Sep 27 '16 at 07:18
  • So, what is your pattern now? Match all the uppercased words after `"AM/PM - "`? Can there be Unicode letters, like in `Сергей Макаров`? – Wiktor Stribiżew Sep 27 '16 at 07:20
  • var str = "4/16/14, 10:24 AM - John Doe How is everything going on? Check this iPhone7!" Above exp matches this str. It should not since there is no trailing ":" after Doe – senchaDev Sep 27 '16 at 07:22
  • The symbol you mentioned in the question is `U+2000, EN QUAD`. Acc. to MDN, `\s` matches [*`[ \f\n\r\t\v​\u00a0\u1680​\u180e\u2000​-\u200a​\u2028\u2029\u202f\u205f​\u3000\ufeff]`*](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp). I think `/\b[AP]M\s+(?:-\s+)?([^:]+):/` should work for you. If not, try the one I currently have in the answer. – Wiktor Stribiżew Sep 27 '16 at 07:54
0

I have made an example with your regex that catches special chars which you can find here

As i said im using your regex by modifying it as such:

[AP]M[^a-zA-Z]-[^a-zA-Z]\w+[ ]?\w+[ ]?\w+:.

if you would like to also exclude digits you can modify it as such:

[AP]M[^a-zA-Z\d]-[^a-zA-Z\d]\w+[ ]?\w+[ ]?\w+:.

Also if you are expecting special characters in the name you can use \S instead of \w, this will include everything but white space chars. Then the regex will be as such:

[AP]M[^a-zA-Z]-[^a-zA-Z]\S+[ ]?\S+[ ]?\S+:.

i have updated the Regex101 example.

  • This blacklisting approach is somewhat fragile. If there are spaces around `-`, the `[^a-zA-Z\d]` are not helpful since `\s` is quite enough. When it comes to `\w`, what if the name is `Łukasz Sącz`? It won't match then. – Wiktor Stribiżew Sep 27 '16 at 07:12
  • I would like to prove you wrong on that. First of all \s is not enough since the question is stated that `" Also, Sometimes there maybe special characters instead of white spaces in AM - John Doe:. Regular expression should work in this case also."`. Therefore \s is not enough at all. Also if you use [\S ] instead you will also catch any a-z chars. The point of the whole regex is to catch the special chars and white space around `-`. – Vladimir Drenovski Sep 27 '16 at 07:43