1

I tried the solution for the picking commas outside quotes using regexp in Matlab (MacOSX)

str='"This string has comma , inside the quotes", 2nd string, 3rd string'

I expect the three tokens

 "This string has comma , inside the quotes"
  2nd string
  3rd string

I used the following but get an empty solution

regexp(str, '\^([^"]|"[^"]*")*?(,)\')

ans =

     []

What should be correct regexp grammar for this.

Community
  • 1
  • 1
Shan
  • 5,054
  • 12
  • 44
  • 58

1 Answers1

3

Without regular expressions

You could

  1. Detect the positions of commas outside double-quotation marks: they are commas that have an even (possibly zero) number of double-qoutation marks to their left.
  2. Split the string at those points.
  3. Remove commas at the end of all substrings except the last.

Code:

pos = find(~mod(cumsum(str=='"'),2)&str==',');                                   %// step 1
result = mat2cell(str, 1, diff([0 pos numel(str)]));                             %// step 2
result(1:end-1) = cellfun(@(x) x(1:end-1), result(1:end-1), 'uniformoutput', 0); %// step 3

With regular expressions

Split at commas preceded by an even (possibly zero) number of double-quotation marks:

result = regexp(str,'(?<=(".*").*),', 'split');
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147