The lowest level functions to read a file are,
(as described in Import Text Data Files with Low-Level I/O):
In your case, the input file is ascii, not binary, so right away we can drop the last option (fread
).
You're left with fgetl/fgets
(used to read a file line by line then parse each line) and fscanf
.
You already have two answers using the line by line approach, so I won't detail this one and rather show you how you can use fscanf
(since your data are suitable, they are indeed organised in a formated pattern).
The advantage of using fscanf
, is that provided you use the right formatSpec
parameter, the function will be able to read the whole file in one go, instead of iterating line by line.
That is true for all the numeric data in your file. We will have to do a second pass for the text elements in your last column.
Defining the format specifiers:
first lets define your format specifications. We will use a different format for each pass. The first pass will read all the numeric data, but will skip the text fields, while the second pass will do the opposite, ignore all numeric data and read only the text fields. The '*'
character is really useful for that when defining a format specifier:
DataFormatSpec = repmat('%d,',1,8) ;
DataFormatSpec = [DataFormatSpec '%*s'] ; % yield: '%d,%d,%d,%d,%d,%d,%d,%d,%*s'
TextFormatSpec = repmat('%*d,',1,8) ;
TextFormatSpec = [TextFormatSpec '%s'] ; % yield: '%*d,%*d,%*d,%*d,%*d,%*d,%*d,%*d,%s'
I've used %d
for all columns because in your sample data I did not see the variety of numeric types you mentionned. You can easily substitute this for %f
if the data requires it, and you can mix both types without problem in the same specifier (as long as they are all numeric). Just use what makes sense for each column.
Armed with that, let's get to the file. The data are in a formated pattern after the headerline, so we first need to get past the header line before calling fscanf
. We'll do that as you've done:
%% Open file and retrieve header line
fid = fopen( filein,'r') ;
hdr = fgetl(fid) ; % Retrieve the header line
DataStartIndex = ftell(fid) ; % save the starting index of data section
The call to ftell
allows us to save the position of the file pointer. After we read the header, the pointer is located right at the beginning of the data section. We save it to be able to rewind the file pointer to the same point for the second reading pass.
Reading the numeric data:
This is done extremely rapidly by fscanf
in a simple call:
%% First pass, read the "numeric values"
dataArray = fscanf( fid , DataFormatSpec , [8 Inf]).' ;
Notice the transpose operator .'
at the end of the line. This is because fscanf
populate the values it reads in a column major order, but the data in the text file are read in a line major order. The transpose operation at the end is just to have an output array the same dimension than it was in the text file.
Now dataArray
contains all your numeric data:
>> dataArray
dataArray =
18 8 318 150 3436 11 70 1
16 8 304 150 3433 12 70 1
Reading the text data:
This is were it gets slightly more complex. fscanf
automatically convert text characters to their ascii value. It's easy enough to convert that back to actual characters (using the function char()
). The biggest hurdle is that if we read all the texts fields in one go, they will all appear as a sucession of numbers, but there will be no way to know where each string stops and where the next starts. To overcome that, we'll do a reading line by line, but still using fscanf
:
%% Second pass, read the "text" values
fseek(fid,DataStartIndex,'bof') ; % Rewind file pointer to the start of data section
nRows = size(dataArray,1) ; % How many lines we'll need to read
textArray = cell(nRows,1) ; % pre allocate a cell array to receive the text column elements
for iline=1:nRows
textArray{iline,1} = char( fscanf(fid,TextFormatSpec,1).' ) ;
end
fclose(fid) ; % Close file
Notice again the use of the transpose operator .'
, and the use of char()
. Now textArray
is a cell array containing all your text fields:
>> textArray
textArray =
'sampletext'
'sampletext2'
Regrouping data set:
Personnally, I would keep the two arrays separate as they are the most optimized containers for each type of data (double
array for numeric data, and cell
array for an array of strings). However, if you need to regroup them in a single data structure, you can either use a cell array:
%% Optional, merge data into cell array
FullArray = [num2cell(dataArray) textArray]
FullArray =
[18] [8] [318] [150] [3436] [11] [70] [1] 'sampletext'
[16] [8] [304] [150] [3433] [12] [70] [1] 'sampletext2'
Or you could use a table
:
%% Optional, merge data into a table
T = array2table(dataArray) ;
T.text = textArray ;
T.Properties.VariableNames = [cellstr(reshape(sprintf('v%d',1:8),2,[]).') ; {'text'}] ;
Which gives:
T =
v1 v2 v3 v4 v5 v6 v7 v8 text
__ __ ___ ___ ____ __ __ __ _____________
18 8 318 150 3436 11 70 1 'sampletext'
16 8 304 150 3433 12 70 1 'sampletext2'
Obviously if you choose the table version, use the variable names you parsed from the header instead of the automatically generated one I used in this example.