0

I have a huge text file that needs to be read and processed in MATLAB. This file at some points contain text to indicate that a new data series has started. I have searched here but cant find any simple solution.

So what I want to do is to read the data in the file, put the data in a table in three different columns and when it finds text it should create a new table. It should repeat this process until the entire document is scanned.

This is how the document looks like:

    time    V(A,B)  I(R1)
    Step Information: X=1  (Run: 1/11)
    0.000000000000000e+000  -2.680148e-016  0.000000e+00
    9.843925313007988e-012  -4.753470e-006  2.216314e-011
    1.000052605772457e-011  -4.835427e-006  2.552497e-011
    1.031372754715773e-011  -4.999340e-006  -3.042096e-012
    1.094013052602406e-011  -5.327165e-006  -1.206968e-011
    Step Information: X=1  (Run: 2/11)
    0.000000000000000e+000  -2.680148e-016  0.000000e+000
    9.843925313007988e-012  -4.753470e-006  2.216314e-011
    1.000052605772457e-011  -4.835427e-006  2.552497e-011
    1.031372754715773e-011  -4.999340e-006  -3.042096e-012
    1.094013052602406e-011  -5.327165e-006  -1.206968e-011
rayryeng
  • 102,964
  • 22
  • 184
  • 193

2 Answers2

2

A rather crude approach is to read the file line by line and check if the line consists of three numbers. If it does, then append this to a temporary matrix. When you finally get to a line that doesn't contain three numbers, append this matrix as an element in a cell array, clear the temporary matrix and continue.

Something like this would work, assuming that the file is stored in 'file.txt':

%// Open the file
f = fopen('file.txt', 'r');

%// Initialize empty cell array
data = {};

%// Initialize temporary matrix
temp = [];

%// Loop over the file...
while true
    %// Get a line from the file
    line = fgetl(f);

    %// If we reach the end of the file, get out
    if line == -1
        %// Last check before we break
        %// Check if the temporary matrix isn't empty and add
        if ~isempty(temp)
            data = [data; temp];
        end
        break; 
    end

    %// Else, check to see if this line contains three numbers
    numbers = textscan(line, '%f %f %f');

    %// If this line doesn't consist of three numbers...
    if all(cellfun(@isempty, numbers))
        %// If the temporary matrix is empty, skip
        if isempty(temp)
            continue;
        end
        %// Concatenate to cell array
        data = [data; temp];
        %// Reset temporary matrix
        temp = [];
    %// If this does, then create a row vector and concatenate
    else
        temp = [temp; numbers{:}];
    end
end

%// Close the file
fclose(f);

The code is pretty self-explanatory but let's go into it to be sure you know what's going on. First open up the file with fopen to get a "pointer" to the file, then initialize our cell array that will contain our matrices as well as the temporary matrix used when reading in matrices in between header information. After we simply loop over each line of the file and we can grab a line with fgetl using the file pointer we created. We then check to see if we have reached the end of the file and if we have, let's check to see if the temporary matrix has any numerical data in it. If it does, add this into our cell array then finally get out of the loop. We use fclose to close up the file and clean things up.

Now the heart of the operation is what follows after this check. We use textscan and search for three numbers separated by spaces. That's done with the '%f %f %f' format specifier. This should give you a cell array of three elements if you are successful with numbers. If this is correct, then convert this cell array of elements into a row of numbers and concatenate this into the temporary matrix. Doing temp = [temp; numbers{:}]; facilitates this concatenation. Simply put I piece together each number and concatenate them horizontally to create a single row of numbers. I then take this row and concatenate this as another row in the temporary matrix.

Should we finally get to a line where it's all text, this will give you all three elements in the cell array found by textscan to be empty. That's the purpose of the all and cellfun call. We search each element in the cell and see if it's empty. If every element is empty, this is a line that is text. If this situation arises, simply take the temporary matrix and add this as a new entry into your cell array. You'd then reset the temporary matrix and start the logic over again.

However, we also have to take into account that there may be multiple lines that consist of text. That's what the additional if statement is for inside the first if block using all. If we have an additional line of text that precedes a previous line of text, the temporary matrix of values should still be empty and so you should check to see if that is empty before you try and concatenate the temporary matrix. If it's empty, don't bother and just continue.


After running this code, I get the following for my data matrix:

>> format long g
>> celldisp(data)


data{1} =

                         0             -2.680148e-16                         0
      9.84392531300799e-12              -4.75347e-06              2.216314e-11
      1.00005260577246e-11             -4.835427e-06              2.552497e-11
      1.03137275471577e-11              -4.99934e-06             -3.042096e-12
      1.09401305260241e-11             -5.327165e-06             -1.206968e-11



data{2} =

                         0             -2.680148e-16                         0
      9.84392531300799e-12              -4.75347e-06              2.216314e-11
      1.00005260577246e-11             -4.835427e-06              2.552497e-11
      1.03137275471577e-11              -4.99934e-06             -3.042096e-12
      1.09401305260241e-11             -5.327165e-06             -1.206968e-11

To access a particular "table", do data{ii} where ii is the table you want to access that was read in from top to bottom in your text file.

rayryeng
  • 102,964
  • 22
  • 184
  • 193
0

The most versatile way is to read line by line using textscan. If you want to speed this process up, you can have a dummy read first: ie. You loop through all the lines without storing the data and decide which lines are the text lines and which are numbers, recording a quick number of lines for each. You then have enough information about the data to run through quickly the arrays. This will speed up the time it takes to store the data in your new arrays massively. Your second loop is the one that actually reads the data into the array/s. You should now know which lines to skip. You can also pre-allocate the arrays within the data cell if you wish to.

fid = fopen('file.txt','r');
data = {};
nlines = [];

% now start the loop
k=0;  % counter for data sets

while ~feof(fid)

    line = fgetl(fid);

    % check if is data or text
    if all(ismember(line,' 0123456789+.')) % is it data
        nlines(k) = nlines(k)+1;
    else                                   %is it text
        k=k+1;
        nlines(k) = 0;
    end
end

frewind(fid);  % go back to start of file

% You could preallocate the data array here if you wished

% now get the data
for aa = 1 : length(nlines)
    if nlines(aa)==0;
        continue
    end
    textscan(fid,'%s\r\n',1); % skip textline
    data{aa} = textscan(fid,'%f%f%f\r\n',nlines(k));
end