2

This feels like a silly question, but I have a string like:

aaaa/bbb\/ccc

The \/ represents an escaped delimiter being used in the name of a path component.

So, the string represents two path components aaaa and bbb/ccc

This string is generated based on a need to create a path from path components where the need is to use / as the delimiter between components and / may also appear in a component name. This is the reason behind the need to escape / when it appears in a component name.

There may be two or more components.

Using a regex like (?:\\\/|[^\/])+ is close to what I am looking for, but when considering the string this/is\/a/\/str\\/ing, it fails to split it into the components this & is\/a & \/str\\ & ing.

Instead, the final component is determined to be \/str\\/ing.

My question is what does the javascript code look like that would allow me to split paths into path components when the component delimiter can be used in the name of a component?

In the example above, I would want to end up with two strings aaaa and bbb/ccc?

Is there a standard function that deals with this or would I need to use a regex to help me split?

Thank you.

James Hudson
  • 844
  • 6
  • 19
  • Will there only be two components? – Unmitigated Aug 06 '21 at 21:19
  • There can be one or more components. – James Hudson Aug 06 '21 at 21:24
  • This string is generated based on a need to create a path from path components where the need is to use / as the delimiter between components and / may also appear in a component name. I hope this is good enough. This is the reason behind the need to escape / when it appears in a component name. It is being used in an electron app which may also be running in the browser. – James Hudson Aug 06 '21 at 21:26
  • This seems poor design choice. see: [Is it possible to use “/” in a filename?](https://stackoverflow.com/questions/9847288/is-it-possible-to-use-in-a-filename), but also a duplicate: [JS regexp to split string based on character not preceded by backslash](https://stackoverflow.com/questions/43831307/js-regexp-to-split-string-based-on-character-not-preceded-by-backslash) – pilchard Aug 06 '21 at 21:31
  • @pilchard these paths are not related to filenames or posix paths. the context is different. I agree, I wish another choice had been made. – James Hudson Aug 06 '21 at 21:44
  • @pilchard That other SO question is close, but not quite. I changed the regex to ```(?:\\\/|[^\/])+``` but it will not properly split ```this/is\/a/\/str\\/ing``` which should have the components ```this``` & ```is\/a``` & ```\/str\\``` & ```ing```. – James Hudson Aug 06 '21 at 21:50
  • Are there any characters that are not allowed/won't appear in the string? – pilchard Aug 06 '21 at 21:55
  • @pilchard all characters are allowed in the string, including unicode characters. I am beginning to think the decision made was a really bad one. I can only add it was not made by me. I argued against this and something better like keeping the components in an array and not as a string) – James Hudson Aug 06 '21 at 21:56
  • Why should `aaaa/bbb\\/ccc/ddd\/eee/fff/ggg` give components ``bbb\`` and `ccc` and not `bbb\\/ccc` according to the comment in the given answer? – The fourth bird Aug 07 '21 at 09:47
  • @Thefourthbird Because ```\``` is being used as an escape character. So, to use ```\``` in a component name, it too needs to be escaped...hence ```\\``` becomes ```\``` in the component name. – James Hudson Aug 07 '21 at 12:24
  • @pilchard I would love to know what you think of my answer... – James Hudson Aug 08 '21 at 11:13
  • @Thefourthbird I would love to know what you think of my answer... – James Hudson Aug 08 '21 at 11:14

3 Answers3

2

Using a match, you might use:

(?:[^\n\/\\]+|\\[\\\/]?)+

Explanation

  • (?: No capture group
    • [^\n\/\\]+ Match any char except a newline / or \
    • | Or
    • \\[\\\/]? Match \ and optional \ or /
  • )+ Close non capture group and repeat 1+ times

Regex demo

Then in the matches, you can replace \/ with /

const regex = /(?:[^\n\/\\]+|\\[\\\/]?)+/g;
[
  String.raw `aaaa/bbb\/ccc`,
  String.raw `this/is\/a/\/str\\/ing`,
  String.raw `aaaa/bbb\\/ccc/ddd\/eee/fff/ggg`,
  String.raw `this/is\/a/dumb/str\\/ing`,
  String.raw `aaaa/\\bbb`
].forEach(s =>
  console.log(Array.from(s.matchAll(regex), m => m[0].replace("\\/", "/")))
);

If a lookbehind is supported you might use split with an alternation to match 2 scenario's where the string should split.

Then you can replace \/ with / for the result array using Array map for example.

(?<=\\\\)\/|(?<!\\)\/

Explanation

  • (?<=\\\\)\/ Match / when directly preceded by \\
  • | Or
  • (?<!\\)\/ Match / when not preceded by \

Regex demo

[
  String.raw `aaaa/bbb\/ccc`,
  String.raw `this/is\/a/\/str\\/ing`,
  String.raw `aaaa/bbb\\/ccc/ddd\/eee/fff/ggg`,
  String.raw `this/is\/a/dumb/str\\/ing`,
  String.raw `aaaa/\\bbb`
].forEach(s => console.log(
  s
  .split(/(?<=\\\\)\/|(?<!\\)\//)
  .map(m => m.replace("\\/", "/"))));
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • If I understand your answer correctly, you are indicating that my regex will work (or probably does), however, if lookbehind is supported, there is a cleaner regex that can be used. I believe in your case, the second pass over the components is still required to convert ```\/``` into ```/```. – James Hudson Aug 08 '21 at 13:45
  • hummm...for some reason when I run your code snippet, it is giving me a script error. If I place ```matches = path.match(/(?<=\\\\)\/|(?<!\\)\//g);``` in my solution code, nothing is matched. Perhaps lookbehind is not supported for me. – James Hudson Aug 08 '21 at 13:50
  • @JamesHudson Then in your environment the lookbehind is not supported. I think this shortened pattern would also work. `(?:[^\n\/\\]+|\\[\\\/]?)+` See https://regex101.com/r/B6zmY2/1 – The fourth bird Aug 08 '21 at 14:00
  • Ok. Ya, no support for lookbehind. I did verify with my three test cases that your shortened pattern produces the same results. Thank you. – James Hudson Aug 08 '21 at 14:11
  • @JamesHudson I have added another example with that shortened pattern. – The fourth bird Aug 08 '21 at 14:11
  • Thank you for improving your answer. I wish I could get rid of the second pass over the individual components so they would display properly. – James Hudson Aug 08 '21 at 14:13
  • @JamesHudson I am not sure if you can do that as the `\/` part can also be in the middle of a match and you can not skip matching characters that way. – The fourth bird Aug 08 '21 at 14:22
  • I agree. The second pass is needed. I'll give you the answer because the correct regex is the key to the solution and your is better. – James Hudson Aug 08 '21 at 14:35
1

First, because this involves javascript and its escaping rules, the string aaaa/bbb\/ccc needs to be aaaa/bbb\\/ccc.

This is my current solution:

  //path = "aaaa/\\bbb";
  //path = "this/is\\/a/dumb/str\\\\/ing";
  path = "aaaa/bbb\\/ccc";
  
  console.log("Path: ", path);

  const matches = path.match(/((?:[^\/\\]|\\\/|\\\\|\\)+)/g);

  console.log("M: ", matches);

  const pathComponents = matches.reduce((accumulator, component) => {
    component = component.replace("\\/", "/");
    accumulator.push(component);
    return accumulator;
  }, []);

  console.log("Path Components: ", pathComponents);
      
  pathComponents.forEach((component) => {
    console.log(`C: ${component}`);
  });
                

I need to run the matches through a second pass so I can convert the match:

bbb\\/ccc

Into something that will be displayed properly. Without the second pass, it would display as

bbb\/ccc

and needs to display as:

bbb/ccc

Case #1

path = "aaaa/\\bbb";

I see displayed:

C: aaaa
C: \bbb

 

Case #2

path = "this/is\\/a/dumb/str\\\\/ing";

I see displayed:

C: this
C: is/a
C: dumb
C: str\\
C: ing

Case #3 (similar to #2)

path = "aaaa/bbb\\/ccc";

I see displayed:

C: aaaa
C: bbb/ccc
     

SUCCESS in all cases.

I believe I have caught all of the edge cases here.

Turns out this is a harder problem than I originally thought.

James Hudson
  • 844
  • 6
  • 19
  • This answer is not entirely my own, but was developed via https://www.reddit.com/r/learnjavascript/comments/ozeic0/splitting_a_path_when_the_path_delimiter_is_used/h81poik/?context=3 – James Hudson Aug 08 '21 at 11:14
0

const a = String.raw `aaaa/bbb\/ccc/ddd\/eee/fff/ggg`;
console.log(a.replace(/\\\//g, "|").split("/").map(x => x.replaceAll("|", "/")));
Robin Webb
  • 1,355
  • 1
  • 8
  • 15
  • While that it close, it fails for ```aaaa/bbb\\/ccc/ddd\/eee/fff/ggg``` in which there should be components ```bbb\``` and ```ccc```. It produces a single component ```bbb\\/ccc``` – James Hudson Aug 06 '21 at 23:23
  • I would love to know what you think of my answer... – James Hudson Aug 08 '21 at 11:13
  • I think that it's a poor design choice as mentioned above. Store the separate path values in a string array or replace the / delimiter with | or a comma to avoid any ambiguity. For example, `aaaa|bbb/ccc|ddd`. Then there is no ambiguity around the / and no need to escape. I'm not sure `\/` has any effect anyway, which is part of your problem. For example const a = 'aaa\/bbb' is interpreted in the same way as const a = 'aaa/bbb'. Can you actually escape a forward slash? – Robin Webb Aug 08 '21 at 11:44
  • Or to avoid delimiter ambiguity if you absolutely must use `/` as the delimiter, use a text qualifier, i.e. `"aaaa"/"bbb/ccc"`, which is a CSV file standard approach. – Robin Webb Aug 08 '21 at 11:53
  • The text qualifier is a good idea. I will play with it and see how it works. The storage format being used is JSON. I will have to see if, like CSV, it supports the idea as well as CSV does. Of course, I believe one would need to handle a similar case where a component name included the ```"``` character as well. – James Hudson Aug 08 '21 at 13:43