The function fgetl
is used to read a single line from a text file, so one option would be to write a loop which continually reads a single line using fgetl
and checks if the first column contains "5000" before deciding whether to include it in your data set or not.
This is the solution presented in il_raffa's answer. Notice that you actually have to read the entire file anyway, since you read the entire line with fgetl
and then use textscan
on it! So it certainly won't be any faster than reading the entire file and then filtering it (though it may be more memory-efficient).
Really what you want is to read the file character by character, aborting each line if you can determine that you won't be reading it, based on the value of the "A" column.
If you were writing C or another low-level language this would probably be faster than importing the entire file and filtering it afterward. However, because of the overhead introduced by MATLAB it will almost certainly be faster and easier to read the entire file and filter it later. The textscan
function is pretty good (and speedy) at reading delimited files, and 200MB is really not that large (it fits comfortably into memory on any modern computer, for example). You should just make sure to filter each data set after reading it, rather than reading all data sets and then filtering them all.
To the second part of your question, regarding whether you can selectively import columns - MATLAB doesn't provide a built-in way to do this. However, it isn't that tricky, if you can make a few assumptions about your file format. If we assume that
- The file is in comma or tab delimited format
- It has a header line
Then you can read the header line (using fgetl
) which will tell you how many columns there are, and what their names are. You can then use that information to build a call to textscan
which will read the delimited columns, and filter out the ones whose headers don't match what you need. A simple version of this might look like -
function columns = import_columns(filename, headers)
fid = fopen(filename);
hdr = fgetl(fid);
column_headers = regexp(hdr, '\t', 'split'); % split on tabs
num_cols = length(column_headers);
format_str = repmat('%s', 1, num_cols); % create a string like '%s%s%s%s'
columns = textscan(fid, format_str, 'Delimiter', '\t');
fclose(fid);
required_cols = ismember(column_headers, headers);
columns(~required_cols) = []; % remove the columns you don't need
end