0

I've looked everywhere but no solution seems to work for this problem. Say I've been given a String with text like this:

jx():{ww.}55<<;<5<>=-[]+*/"hrw  7 n t";fizz buzz

And I want to split this String into an array so it looks like this (the commas signifying different array positions:

"jx","(",")",":","{","ww",".","}","55","<<",";","<","5","<",">","=","-","[","]","+","*","/","hrw  7 n t",";","fizz","buzz"

This was my Regular Expression, I used the ?= to keep the delimiters or whatever you want to call them and whenever I did, while it did keep the delimiters, it did not split on the delimiters if another character was directly after it, or just sometimes at all, I looked at other solutions and all that I had tried gave me the same result:

str=str.split(/(?=".*")|(?=[(){}\[\]<>:;.\=\-\*\+\/])|(?=\s{2,})|(?=[0-9]+)/)

And if I put the text above through the regular expression I get:

["jx", "(", ")", ":", "{ww", ".", "}", "5", "5", "<", "<", ";", "<", "5", "<", ">", "=", "-", "[", "]", "+", "*", "/", "\"hrw", "  ", "7 n t\"", ";fizz buzz"]

As you can see, "ww" is in the same position as the } and at the end you can see that the semi-colon is in the same array position as fizz buzz which does not go into seperate positions.

The rule is as follows. Split on any whitespace but keep white spaces that are inbetween quotes. Split on {, }, (, ), %, <<, <,>,-,+,=,||,&&,!,?,:,;., or split on a quotation so that the entire text between the quotation will be one array position.

Edit: I believe I found the problem. It was that the regular expression was detecting words in between an ending and beginning quoatation as also in quotations. Example: "fizz" and "buzz" Technically and is between two quotations returning the value of " and " How do I solve this?

Any help is appreciated. Thanks!

Tartarus13
  • 59
  • 6
  • Possible duplicate of [Javascript and regex: split string and keep the separator](https://stackoverflow.com/questions/12001953/javascript-and-regex-split-string-and-keep-the-separator) – Etheryte Mar 06 '18 at 16:53
  • 1
    Are you using `split` or `match`? Please post a complete code sample. – Bergi Mar 06 '18 at 17:31
  • What is the rules of splitting this `jx():{ww.}55<<;<5<>=-[]+*/"hrw 7 n t";fizz buzz` into that: `["jx", "(", ")", ":", "{ww", ".", "}", "5", "5", "<", "<", ";", "<", "5", "<", ">", "=", "-", "[", "]", "+", "*", "/", "\"hrw", " ", "7 n t\"", ";fizz buzz"]`? – felixmosh Mar 06 '18 at 17:55
  • @felixmosh The rule is as follows. Split on any whitespace but keep white spaces that are inbetween quotes. Split on {, }, (, ), %, <<, <,>,-,+,=,||,&&,!,?,:,;., or split on a quotation so that the entire text between the quotation will be one array position. – Tartarus13 Mar 06 '18 at 19:43
  • @Bergi I was using split. I believe I said that: And I want to **split** this String into an array so it looks like this (the commas signifying different array positions: – Tartarus13 Mar 06 '18 at 19:45
  • @Tartarus13 Yes, yes, you want to *split* the string into parts, but that doesn't necessarily mean to use the `.split()` method. Actually using `match` with a global regex is much simpler here. – Bergi Mar 06 '18 at 19:46
  • @Bergi Could you give me a match method here then if it works? Thanks! – Tartarus13 Mar 06 '18 at 20:11
  • 1
    @Tartarus13 `str.match(/[\w\s]+|./g)` should do – Bergi Mar 06 '18 at 20:15
  • @Tartarus13 Is the `<<` token in your expected result a mistake? If no, why do those two delimiters go together? – Bergi Mar 06 '18 at 20:16
  • @Bergi it is not the result of the mistake. I meant if there are two < or << then keep those together. If there is one: <, split on that one Example: `" – Tartarus13 Mar 06 '18 at 20:29
  • @Tartarus13 OK, I had not seen handling for that in your attempt. – Bergi Mar 06 '18 at 21:50

1 Answers1

1

Directly matching the things that you want is usually much easier than splitting on the things between them.

var str = 'jx():{ww.}55<<;<5<>=-[]+*/"hrw  7 n t";fizz buzz';
var regex = /"[^"]*"|<<|&&|\|\||\(|\)|\{|\}|\[|\]|<|>|:|;|\.|=|-|\*|\+|\/|\w+/g;
console.log(str.match(regex));

If you want to avoid having the quotes as parts of the match, you could do something complicated with lookaround, but it's much easier to post-process the array and removing the quotes (e.g. .map(x => x.replace(/^"|"$/g, "")) or .map(x => x[0]=='"' ? x.slice(1,-1) : x)).

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • I tried your method and what came out was strange. `var str = '{{tempknown buzz << "fizz buzz";buzzfizz;} tempstr buzz << "buzzfizz"; fizz; buzz;}'.match(/(?<=")[^"]*(?=")|<<|&&|\|\||\(|\)|\{|\}|\[|\]|<|>|:|;|\.|=|-|\*|\+|\/|\w+/g); console.log(str);` The output as an array came out as follows: `["{", "{", "tempknown", "buzz", "<<", "fizz buzz", ";buzzfizz;} tempstr buzz << ", "buzzfizz", ";", "fizz", ";", "buzz", ";", "}"]` As you can see ";buzzfizz;}tempstr buzz <<" comes out as one array position. This was the problem I was having, do you have any other solutions? – Tartarus13 Mar 07 '18 at 18:18
  • I believe the reason why is because it is reading the things outside of the quotations as things inside. Example: "the" and "that" And is still in the quoattations because it's " and ". How do I solve this? – Tartarus13 Mar 07 '18 at 18:59
  • Ah, the lookahead doesn't consume the quote, so it gets reused as the start of the next "string literal" right away. One could modify the lookahead so that it matches only every second quote in the string (like [in this example](https://stackoverflow.com/q/40479546/1048572)), but just consuming them is much simpler. – Bergi Mar 07 '18 at 19:48