1

I am trying to process the output of a system('./foo') command. If I directly redirect the output to a file with system('./foo > output') and read the file by dlmread into MATLAB, it works fine, but I'm trying to avoid writing a huge ASCII file (about 1e7 lines) on the hard disk every time I do this.

So I want to deal with the output directly by reading it into a huge string and splitting the string. It works fine for small files:

[a,b] = system('./foo')
b=strsplit(b);
cellfun(@str2num, bb);
b=cellfun(@str2num, b(1:end),'UniformOutput',0);
b=cell2mat(b);

Unfortunately, this consumes already in the step of the strsplit operation way too much memory, so that MATLAB gets killed by the OOM killer. I found the alternative:

b=textscan(b,'%s','delimiter',' ','multipleDelimsAsOne',1);

But it also consumes way too much memory.

Can somebody help me with a better idea how to split that string of numbers and read it into a matrix or generally how to avoid writing the output of the command to a file on the hard disk?

Edit: (I'm writing here, because in the comments is not enough space...) @MZimmerman6 I tried now a version by dlmread with and without pre-allocation and your proposal as well as I understood it: In fact the loop is much slower than the dlmread.

clear all
close all
tic 
ttags1=dlmread('tmp.txt',' ',1,3);

toc

clear all

tic
[~,result]=system('perl -e ''while(<>){};print$.,"\n"'' tmp.txt');
numLines1=str2double(result);
ttags=zeros(numLines1,1);
ttags=dlmread('tmp.txt',' ',1,3);

toc

clear all

tic 
fid = fopen('tmp.txt');
count = 1;
[~,result]=system('perl -e ''while(<>){};print$.,"\n"'' tmp.txt');
numLines1=str2double(result);
temp = cell(numLines1,1);
for i = 1:numLines1
    tline = fgetl(fid);
    if ischar(tline)
        vals = textscan(tline,'%f','delimiter',',');
        temp{i} = transpose(vals{1});
    end
end
fclose(fid);
temp  = cell2mat(temp);

 toc

The result is:

Elapsed time is 19.762470 seconds.
Elapsed time is 21.546079 seconds.
Elapsed time is 796.755343 seconds.

Thank you & Best Regards

Am I doing something wrong?

Community
  • 1
  • 1
Mechanix
  • 95
  • 1
  • 8

1 Answers1

1

You should not try to read the entire file into memory, as this can be extremely memory heavy. I would recommend reading the file line by line, and processing each individually, then store the results into a cell array. You can then, once the parsing is done, convert that into a normal matrix.

The first thing I could do is create a small Perl script to count the number of lines in the file you are reading, so you can pre-allocate memory for the data. Call this file countlines.pl. Information gathered from here

Perl - Countlines.pl

 while (<>) {};
 print $.,"\n";

This file will only be two lines, but will quickly count the total lines in the file.

You can then use the result of this file to pre-allocate and then do your line by line parsing. I used in my testing a simple comma separated file, so you can adjust textscan to handle things as you want.

MATLAB Script

% get number of lines in data file
numLines = str2double(perl('countlines.pl','text.txt'));
fid = fopen('text.txt');
count = 1;
temp = cell(numLines,1);
for i = 1:numLines
    tline = fgetl(fid);
    if ischar(tline)
        vals = textscan(tline,'%f','delimiter',',');
        temp{i} = transpose(vals{1});
    end
end
fclose(fid);
temp  = cell2mat(temp);

This should run relatively quickly depending on your file size, and do what you want. Of course you can edit how the parsing is done inside the loop, but this should be a good starting point.

Note for the future, do not try to read large amounts of stuff into memory if it is not completely necessary

MZimmerman6
  • 8,445
  • 10
  • 40
  • 70
  • +1. Other ways to count lines fast [here](http://stackoverflow.com/questions/12176519/is-there-a-way-in-matlab-to-determine-the-number-of-lines-in-a-file-without-loop). – horchler Feb 05 '14 at 21:30
  • And can't one use something like the following: `[~,result]=system('perl -e ''while(<>){};print$.,"\n"'' text.txt');` `numLines=str2double(result);`? Or does using a file make it faster or is this just not cross-platform? – horchler Feb 05 '14 at 21:45
  • Ok, Thank you for your answers! I will indeed check whether i can avoid loading the entire file at once. I was just wondering why you used a cell array to read the data? Is that better than allocating the space with zeros(numLines,1) if I know that I want to read numbers? Thanks again, also to horchler! – Mechanix Feb 06 '14 at 07:34
  • the only reason I used a cell array is because it is more dynamic when pre-allocating space and also if your file contains strings as well as numbers, a cell array can contain multiple data types while a matrix can not. @horchler as for whether or not a separate file makes it faster, I don't know, but I like to have the file so that I can reuse the code in multiple places by simply putting that perl file in the MATLAB root. Just personal preference – MZimmerman6 Feb 06 '14 at 13:24