3

I have a CSV file that has 20 columns. Some of the columns have number values, others have text values, and the text ones may or may not contain commas.

CSV content example:

column1, column2, column3, column4
"text value 1", 123, "text, with a comma", 25
"another, comma", 456, "other text", 78

I'm using textscan function, but I'm getting the most buggy and weird behavior. With some arguments, it reads all the values in only one column, sometimgs it repeats columns, and most of the things I've tried lead to the commas being incorrectly interpreted as column separators (despite text being enclosed in double quotes). That is, I've tried specifying 'delimiter' argument, and also including literals in the format specification, to no avail.

What's the correct way of invoking textscan to deal with a CSV file as the example above? I'm looking for a solution that runs both on MATLAB and on Octave (or, if that's not possible, the equivalent solution in each one).

jotadepicas
  • 2,389
  • 2
  • 26
  • 48

2 Answers2

9

For GNU Octave, using io package

pkg load io
c = csv2cell ("jota.csv")

gives

c = 
{
  [1,1] = column1
  [2,1] = text value 1
  [3,1] = another, comma
  [1,2] =  column2
  [2,2] =  123
  [3,2] =  456
  [1,3] =  column3
  [2,3] =  text, with a comma
  [3,3] =  other text
  [1,4] =  column4
  [2,4] =  25
  [3,4] =  78
}

btw, you should explicitly mention if the solution should run on GNU Octave, Matlab or both

Andy
  • 7,931
  • 4
  • 25
  • 45
  • Thanks. I clarified that I'm looking for a solution that runs on both Octave and MATLAB. I think `csv2cell` doesn't exist in MATLAB by default? – jotadepicas Oct 20 '17 at 15:11
  • @jotadepicas looks similiar: https://de.mathworks.com/matlabcentral/fileexchange/20836-csv2cell – Andy Oct 21 '17 at 07:17
2

First, read the column headers using the format '%s' four times:

fileID = fopen(filename);
C_text = textscan(fileID,'%s', 4,'Delimiter',',');

Then use the conversion specifier, %q to read the text enclosed by double quotation marks ("):

C = textscan(fileID,'%q %d %q %d','Delimiter',',');
fclose(fileID);

(This works for reading your sample data on Octave. It should work on MATLAB, too.)

Edit: removed redundant fopen.

ali14
  • 124
  • 1
  • 7
  • Hi, I don't understand how variable `C_text` of the first statement relates to the second `textscan`. Aren't those two `textscan` invocations unrelated to each other? – jotadepicas Oct 20 '17 at 15:12
  • 1
    Hi, The first textscan simply reads in the headers and stores it in the C_text. By doing so, the file pointer will point to the second line, and here enters the second textscan. It will parse the rest of the file, because it's not possible to combine the two textscans (reading the header and the data in one call because of different formats). You can later concat C_text and C if you need to have the header along with the data in one variable. – ali14 Oct 20 '17 at 18:50