1

I am reading a csv file in matlab as follows.

data_f = fopen(fileName,'r');
while(~feof(data_f))
    line_f = fgetl(data_f);
    Temp(1,:) = regexp(line_f, ',', 'split');
end

My problem is some of the columns in a row has [a,b] format data. So when I try to use regular expression with only ',' as the delimiter it throws as error. So how should i write the regular expression for this purpose.

e.g: CSV file values are

12,23,a,3,[1,2],5

I need as follows

12 23 a 3 [1,2] 5

and not like this

12 23 a 3 1 2 5
Amro
  • 123,847
  • 25
  • 243
  • 454
kaja
  • 55
  • 3
  • 9
  • Not exactly a duplicate, but my own question may be useful. http://stackoverflow.com/questions/545584/regular-expressions-with-matching-brackets – Nzbuu Oct 20 '11 at 16:18
  • there are multiple [tag:regex] questions for dealing with CSV files while ignoring commas if they are contained inside quotations (you have brackets instead of quotes, but its basically the same idea): http://stackoverflow.com/q/1189416, http://stackoverflow.com/q/632475, http://stackoverflow.com/q/639264, http://stackoverflow.com/q/2170184 – Amro Oct 20 '11 at 20:02

3 Answers3

1

The following code will dissect the string s='3,4,[5,6],a,2'

s='3,4,[5,6],a,2'
bracket=false;
i=1;
A=[];

while ~isempty(s)
  if s(i)==',' && bracket==false
    A(end+1)=s(1:i-1);
    s(1:i)=[] 
    i=1; 
  end 
  if s(i)=='[' 
    bracket=true; 
  end 
  if s(i)==']'
    bracket=false; 
  end 
  i=i+1; 
  if i>length(s) 
    i=i-1; 
    A(end+1)=s(1:i);
    s(1:i)=[]; 
  end 
end

-edit after rereading your question-

Hoogendijk
  • 100
  • 1
  • 10
1

I think it is better to use some CSV parser library for this purpose. But if you have a reason to use a regex then you can use this one

,(?=[^]]*(\[|$))

This will not work if you do not have nested []

Explanation: Match a , which is followed by any number of non ] characters, which is then followed by either [ or $(end of line)

Narendra Yadala
  • 9,554
  • 1
  • 28
  • 43
1

As stated by others, it would be better to use a dedicated CSV parser library... Still, here is one possible solution:

file.csv

12,23,aaaaa,3,[1,2,3,4,5],5
222,33,b,4,[2],6
32,43,c,5,[3,4],7
42,534,ddd,6,[4,5,0],8
52,63,e,7,[5,6],9

MATLAB

%# cell array to hold the data
C = cell(0,6);

%# read file line-by-line
fid = fopen('file.csv','rt');
while ~feof(fid)
    tline = fgetl(fid);

    %# get [...] tokens and their locations (assuming there is one per line)
    [tok tokExt] = regexp(tline, '\[(.*)\]', 'tokens', 'tokenExtents', 'once');

    %# replace commas with space in tokens, and place back into line
    tline(tokExt(1):tokExt(2)) = strrep(tok{1},',',' ');

    %# split line by commas and store the parts read
    C(end+1,:) = textscan(tline, '%f %f %s %f %s %f', 'Delimiter',',');
end
fclose(fid);

%# reduce nested level of cell array
C(:,3) = vertcat(C{:,3});
C(:,5) = vertcat(C{:,5});

The result of reading the sample file above:

>> C
C = 
    [ 12]    [ 23]    'aaaaa'    [3]    '[1 2 3 4 5]'    [5]
    [222]    [ 33]    'b'        [4]    '[2]'            [6]
    [ 32]    [ 43]    'c'        [5]    '[3 4]'          [7]
    [ 42]    [534]    'ddd'      [6]    '[4 5 0]'        [8]
    [ 52]    [ 63]    'e'        [7]    '[5 6]'          [9]

Obviously this is a cell array printed in the command prompt (MATLAB uses [] to denote matrices entries, so don't confuse those with the brackets read from file)..

If you want to get the numeric values of the fifth column, you can use STR2NUM:

C(:,5) = cellfun(@str2num, C(:,5), 'UniformOutput',false)
Amro
  • 123,847
  • 25
  • 243
  • 454