1

I am using the following, to split some text at either .?!:; and keep the delimiter :

var str = 'Hello! Here is a file. It is called beatles.mp3. And so on!';
let arr = str.match(/[^\/.?!:;]+(?:[\/.?!:;]|$)/g);
// output ==> ["Hello!", "Here is a file.", "It is called beatles.", "mp3.", "And so on!"]

This is fine, but I'd like to have a way to say (and, just as I do now, keep the delimiter): "Split everywhere where there is a ., but if there is a . followed by mp3, I'd like you to keep the full .mp3. Anywhere else, split where there's a ."

Wanted output:

["Hello!", "Here is a file.", "It is called beatles.mp3.", "And so on!"]
userjmillohara
  • 457
  • 5
  • 25

3 Answers3

2

You may try:

((?:\.(?!mp3)|[!&^?:;/]))

Explanation of the above regex:

  • (?:\.(?!mp3) - Represents a non-capturing group not matching a . if it is preceded by mp3.
  • | - Represents alternation.
  • [!&^?:;/] - Represents punctuation where split may happen. You can add other punctuation too.
  • $1\n - For the replacement part use the captured group followed by a new-line. Finally split the result string and remove the trailing following spaces which occur.

Pictorial Representation

You can find the demo of the above regex in here.

/*
const regex = /(?:\. (?!mp3)|[!&^?:;/] ?)/g;
const str = `Hello! Here is a file. It is called beatles.mp3. And so on!`;
console.log(str.split(regex).filter(el => el));
*/
const regex = /((?:\.(?!mp3)|[!&^?:;/]))/gm;
const str = `Hello! Here is a file. It is called beatles.mp3. And so on!`;
const subst = `$1\n`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

result.split("\n").forEach(el => console.log(el.trim()));
1

You can try this regex:

const str = 'Hello! Here is a file. It is called beatles.mp3. And so on!';
const arr = str.match(/[^ ].+?(\.(?!mp3)|[\/?!:;])/g);

Output:

["Hello!", "Here is a file.", "It is called beatles.mp3.", "And so on!"]
Rustam D9RS
  • 3,236
  • 1
  • 10
  • 17
1

You could match any char except one of the delimiters including not matching the dot.

When do matching a dot, check if what is on the right is not mp3. If that is the case, you can match the dot.

Repeat that process until you encounter one of the delimiters .?!:;\/

([^.?!:;\/]+(?:\.(?=mp3\b)[^.?!:;\/]*)*[.?!:;\/]) ?

Explanation

  • ( Capture group 1
    • [^.?!:;\/]+ Match 1+ times any char except the listed
    • (?: Non capture group
      • \.(?=mp3\b) Match . and assert what is directly to the right is not mp3
      • [^.?!:;\/]* Match 0+ times any char except the listed
    • )* Close non capture group and repeat 0+ times
    • [.?!:;\/] Match one of the listed
  • ) ? Close group 1 and match an optional space

Regex demo

The value is in capturing group 1 m[1] in the example code.

const regex = /([^.?!:;\/]+(?:\.(?=mp3\b)[^.?!:;\/]*)*[.?!:;\/]) ?/g;
const str = `Hello! Here is a file. It is called beatles.mp3. And so on!`;
let m;

while ((m = regex.exec(str)) !== null) console.log(m[1]);
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • this is so amazing, helping me out A LOT. thank you so much, i picked your answer. thanks also for the detailed explanation. i have so many regex questions still, can you recommend a good resource? I'm really impressed, thanks again! – userjmillohara Jul 06 '20 at 17:01
  • 1
    @userjmillohara You are welcome. These sites contain a lot of information https://www.regular-expressions.info/tutorial.html and https://www.rexegg.com/ SO itself also contains pages with very nice answers and explanations, for example see https://stackoverflow.com/a/22944075/5424988 – The fourth bird Jul 06 '20 at 17:08