4

How can this line be split while preserving quoted strings

>div#a.more.style.ui[url="in.tray"]{value}

where the chars for the split are

> # . [ {

to yield:

>div
#a
.more
.style
.ui
[url="in.tray"]
{value}

Current effort is:

\>|\[|\{|#|\.?(?:(["'])(?:\\?.)*?\1)*

with "in.tray" being split on.

Update 1:

The solution needs to be regex based as the pattern is assembled from the keys of a JS object in the existing code, which are:

JSObject
    '>': function ...
    '^': function ...
    '[': function ...
     ...

with the functions used as callbacks to process the output from the regex.

The target string is an Emmet macro and may contain plain characters to start, as well as possible repeats of at least ^, $ to be treated as separate elements e.g:

p>div>div>span^h2^^h1>div#a.li^mo+re.st*yle.ui[url="in.tray"]{value}$$$

Current effort based on @tim-pietzcker using .match() but with an empty last match filtered out:

[a-z$^+*>#.[{]{0,1}(?:"[^"]*"|[^"$^+*>#.[{]){0,}

Community
  • 1
  • 1
MX4399
  • 1,519
  • 1
  • 15
  • 27

3 Answers3

4

Don't use split(), then it's easy:

result = subject.match(/[>#.[{](?:"[^"]*"|[^">#.[{])+/g);

See it live on regex101.com.

Explanation:

[>#.[{]     # Match a "splitting" character
(?:         # Start of group to match either...
 "[^"]*"    # a quoted string
|           # or
 [^">#.[{]  # any character except quotes and "splitting" characters
)+          # Repeat at least once.
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Hi just curious to know whether you type "See it live on regex101.com." or you have code to generate the link ? because it always looks the same – aelor Mar 19 '14 at 11:22
  • @aelor: I use an [AceText](http://www.acetext.com) shortcut that contains the static text and automatically inserts the relevant URL into it. – Tim Pietzcker Mar 19 '14 at 11:24
  • @dystroy: Well, more out of habit - if I don't need to reuse a subgroup match, then I use a non-capturing group. In this case, there might even be a relevant performance benefit since capturing groups would capture many, many submatches and discard them immediately afterwards. But I haven't measured it. – Tim Pietzcker Mar 19 '14 at 11:42
1

It's hard coming with a solution using only one regex.

I can propose this :

var i=0, s= '>div#a.more.style.ui[url="in.tray"]{value}';
var tokens = s.replace(/("[^"]+"|[^"\s]+)/g, function(v){
     return (i++)%2 ? v : v.replace(/([.>#\[{])/g, '@@@$1')}
).split('@@@').filter(Boolean);

(replace @@@ with a string you know isn't in your string.

The idea is to

  1. split the initial string into strings out of quotes and strings in quotes (alternatively, and the latter ones with their quotes) (not a real split, just a conceptual one)
  2. outside of the quotes, add @@@ before the separator
  3. split on @@@ the joined string
  4. remove the (potential) empty strings using filter
Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
-1

I do wonder if Regex is really the way to go in this case. I know this was tagged as regex, but I'd like to share a non-Regex solution which simply processes each character:

var string = '>div#a.more.style.ui[url="in.tray"]{value}'
var delims = [ '>', '#', '.', '[', '{' ];
var inQuotes = false;
var parts = [];
var part = string[0]; // Start with first character

for(i = 1; i < string.length; i++) {
  var character = string[i];

  if(character == '"') inQuotes = !inQuotes;

  if(!inQuotes && delims.indexOf(character) > -1) {
    parts.push(part);
    part = character;
  } else part += character;

  if(i == string.length-1) parts.push(part);
}

console.log(parts);

Output:

[ '>div',
  '#a',
  '.more',
  '.style',
  '.ui',
  '[url="in.tray"]',
  '{value}' ]

The inQuotes business will not work for escaped quotes within quotes, i.e., "He said, \"hi there!\"", but for simple cases like this it will work. You can extend it to check if the quote is an escaped quote inside a quote by comparing the previous character to "\" and checking if isQuotes is currently true I suppose, but there are probably better solutions to that.

In terms of readability I think an approach like this is preferred over Regex, though.

Daniël Knippers
  • 3,049
  • 1
  • 11
  • 17
  • +1 for the readability comment and not just throwing `regex` at the problem - but the code requires it in this case - see Update 1. – MX4399 Mar 20 '14 at 04:55