0

Is there a regular expression reg, so that for any string str the results of str.split(".") and str.match(reg) are equivalent? If multiline should somehow matter, a solution for a single line would be sufficient.

As an example: Considering the RegExp /[^\.]+/g: for the string "nice.sentance", "nice.sentance".split(".") gives the same result as "nice.sentance".match(/[^\.]+/g) - ["nice", "sentance"]. However, this is not the case for any string. E.g. for the empty string "" they would give different results, "".split(".") returning [""] and "".match(/[^\.]+/g) returning null, meaning /[^\.]+/g is not a solution, as it would need to work for any possible string.

The question comes from a misinterpretation of another question here and left me wondering. I do not have a practical application for it at the moment and am interested because i could not find an answer - it looks like an interesting RegExp problem. It may however be impossible.

Things i have considered:

  • Imho it is fairly clear that reg needs the global flag, removing capture groups as a possibility

  • /[^\.]+/g does not match empty parts, e.g. for "", ".a" or "a..a"

  • /[^\.]*/g produces additional empty strings after non-empty matches, because when iteration starts for the next match, it can fit in an empty match. E.g. for "a"

  • With features not available on javascript currently (but on other languages), one could repair the previous flaw: /(?<=^|\.)[^\.]*/g

My conclusion here would be that real empty matches need to be considered but cannot be differentiated from empty matches between a non-empty match and the following dot or EOL, without "looking behind". This seems a bit vague to count as a proper argument for it being impossible, but maybe is already enough. There might however be a RegExp feature i don't know about, e.g. to advance the index after a match without including the symbol, or something similar to be used as a trick.

Allowing some correction step on the array resulting from match makes the problem trivial.


I found some related questions, which as expected utilize look-behind or capture groups though:

ASDFGerte
  • 4,695
  • 6
  • 16
  • 33
  • Adding some sample data which explains the problem would be helpful. – Tim Biegeleisen Jun 12 '18 at 02:13
  • Why not replace()? Doesn't split() allow regex as a parameter? – zer00ne Jun 12 '18 at 02:14
  • *"e.g. for the empty string "" they would give different results, [""]"* An empty string passed thru split() will result in an array with everything in the string as single chars. So wouldn't the equivalent be: `str.match(/./g)`? BTW, what type of result is: [""]"*? – zer00ne Jun 12 '18 at 02:26
  • added and changed an example, hopefully it is now a bit more understandable. Searched is a regex while the string operated on varies. – ASDFGerte Jun 12 '18 at 02:31
  • Where are you going to use this? – wp78de Jun 12 '18 at 05:26
  • Let alone the problem with empty string or null, it is just impossible with multi character patterns. – Wiktor Stribiżew Jun 12 '18 at 06:08

1 Answers1

0

I do not see the point but assume you have to apply this in an environment where .split is not available.

Crafting a matching regex that does the same as .split(".") or /\./ requires to account for several cases:

  • no input => empty split
  • single . => two empty splits
  • . at the beginning => empty split at position 0
  • . at the end => empty split at the end
  • . in the middle
  • multiple consecutive .s => one empty split per ..

Following this, I came up with the following solution:

^(?=\.)[^.]*|[^.]+(?=\.)|(?<=\.)[^.]*$|^$|[^.]+|(?=\.)(?<=\.)

Code Sample*:

const regex = /^(?=\.)[^.]*|[^.]+(?=\.)|(?<=\.)[^.]*$|^$|[^.]+|(?=\.)(?<=\.)/gm;
const test = `
.
.a
a.
a.a
a..a
.a.
..a..
.a.z
..`;
var a = test.split("\n");
a.forEach(str => {
    console.log(`"${str}"`);
    console.log(str.split("."));
    let m; let matches = [];
    while ((m = regex.exec(str)) !== null) {
        if (m.index === regex.lastIndex) {
            regex.lastIndex++;
        }
        matches.push(m[0]);
    }
    console.log(matches);
});

The output should be read in triple blocks: input/split/regex-match.
The output on each 2nd and 3rd line should be the same.

Have fun!

*Caveat: This requires RegExp Lookbehind Assertions: JavaScript Lookbehind assertions are approved by TC39 and are now part of the ES2018 standard.

RegExp Lookbehind Assertions have been implemented in V8 and shipped without flags with Google Chrome v62 and in Node.js v6 behind a flag and v9 without a flag. The Firefox team is working on it, and for Microsoft Edge, it's an implementation suggestion.

wp78de
  • 18,207
  • 7
  • 43
  • 71
  • I'll just take it as "impossible without look-behind or other unavailable features", the current implementation status of look-behind was also interesting. PS: The question itself already discusses a regex with look-behind that is shorter and should work aswell. – ASDFGerte Jun 12 '18 at 13:44
  • Yes, `/(?<=^|\.)[^\.]*/g` is probably the prefered solution. I just tried to break it down into pieced and test them one by one. More or less every alternation in my pattern addresses one of the test requirements. However, I could not find a solution that worked with ES5. – wp78de Jun 12 '18 at 16:26