-1

I am trying to split a string by spaces, except when the token is between quotation marks. However, the code I have written also splits the string on the . character, which I do not want. Here is my code:

string txt = "PROGRAM \"My ETABS\" VERSION \"9.7.4\" MERGETOL 0.1";

string[] split = Regex.Matches(txt, "(\\w+|\".*?\")")
                      .Cast<Match>()
                      .Select(m => m.Value)
                      .Select(o => o.Replace("\"", ""))
                      .ToArray();

What I get:

PROGRAM  
My ETABS
VERSION 
9.7.4"  
MERGETOL
0
1

What I need:

PROGRAM  
My ETABS
VERSION 
9.7.4"  
MERGETOL
0.1

How can I modify this code to split the string by spaces, unless the token is between quotation marks, without splitting on the . character?

Vahid
  • 5,144
  • 13
  • 70
  • 146

1 Answers1

2

You can swap the sub expressions then substitute \S in place of \w and
it should work. (".*?"|\S+)

To do it without capturing the quotes, this "(.*?)"|(\S+) where only
one group will contain data. For this you'd need a find next until done.
Each find you can concat the two groups.

  • Since `"` is a single-character, wouldn't `("[^"]*"|\S+)` be better? – Regular Jo Feb 26 '17 at 20:37
  • @cfqueryparam - Better? I don't know. They are different. I could benchmark the two if you think that would make a difference. –  Feb 26 '17 at 20:52
  • Completed iterations: 300 / 300 ( x 1000 ) Matches found per iteration: 6 Regex1: ("[^"]*"|\S+) Elapsed Time: 2.99 s, 2988.36 ms, 2988362 µs Regex2: (".*?"|\S+) Elapsed Time: 2.87 s, 2873.48 ms, 2873477 µs –  Feb 26 '17 at 20:58
  • As the saying goes, `touché`. I thought it would be better of being able to skip the lazy quantifier. Anyway, either variant, this is the better answer imo. – Regular Jo Feb 26 '17 at 21:23