I am looking for an elegant means of extracting nested data from a MATLAB data structure

Question

Using MATLAB, other than the brute force technique of using nested FOR loops, I am curious if there is a more elegant means of extracting the X & Y data from the sample data structure that I have shown below. I haven't been able to devise an elegant way of doing this in MATLAB using bsxfun, arrayfun, or strucfun.

% Create an example of the input structure that I need to parse
for i =1:100
    setName = ['n' num2str(i)];
    for j = 1:randi(10,1)
        repName = ['n' num2str(j)];
        data.sets.(setName).replicates.(repName).X = i + randn();
        data.sets.(setName).replicates.(repName).Y = i + randn();
    end
end

clearvars -except data

% Brute force technique using nested FOR Loops to extract X & Y from this
% nested structure for easy plotting. Is there a better way to extract the
% X & Y values created above without using FOR loops?

n = 1;
setNames = fieldnames(data.sets);
for i =1:length(setNames)
    replicateNames = fieldnames(data.sets.(setNames{i}).replicates);
    for j = 1:length(replicateNames)
        X(n) = data.sets.(setNames{i}).replicates.(replicateNames{j}).X;
        Y(n) = data.sets.(setNames{i}).replicates.(replicateNames{j}).Y;
        n = n+1;
    end
end

scatter(X,Y);

What does this have to do with "defensive programming"? Why is it tagged as such? — Disillusioned, Jun 15 '14 at 09:01

Amro · Accepted Answer · 2014-06-07T04:54:21.050

MATLAB works best with arrays/matrices (be it numeric arrays, struct arrays, cell arrays, object arrays, etc..). The language offers constructs to slice and index into arrays easily.

So the idiomatic way in MATLAB would have been to create a non-scalar structure array, as opposed to a deeply nested structure.

For example lets first convert the nested structure into an 2D array of structures, where the first dimension denotes the "replicates", and the second dimension denotes the "sets":

ds = struct('X',[], 'Y',[]);
sets = fieldnames(data.sets);
for i=1:numel(sets)
    reps = fieldnames(data.sets.(sets{i}).replicates);
    for j=1:numel(reps)
        ds(j,i) = data.sets.(sets{i}).replicates.(reps{j});
    end
end

The result is a 10-by-100 structure array, each with two fields X and Y:

>> ds
ds = 
10x100 struct array with fields:
    X
    Y

Accessing data.sets.n99.replicates.n9 in the original structure would be equivalent to ds(9,99) in the new structure.

>> data.sets.n99.replicates.n9
ans = 
    X: 100.3616
    Y: 98.8023

>> ds(9,99)
ans = 
    X: 100.3616
    Y: 98.8023

This new struct has the benefit that it can easily be accessed using array-indexing notation and comma-separated lists. So we can to extract the X and Y vectors like you did simply as:

XX = [ds.X];    % or XX = cat(2, ds.X)
YY = [ds.Y];
scatter(XX, YY, 1)

So if you had control over building the struct, I would design it as described above to begin with. Otherwise the double for-loop in your code with the dynamic field names is the best way to extract the values from it.

You could probably write a bunch of structfun called on each other, but that won't be the most readable code. Here is what I came up with to flatten the nested structure:

D = structfun(@(n) ...
        structfun(@(nn) [nn.X nn.Y], n.replicates, 'UniformOutput',false), ...
        data.sets, 'UniformOutput',false);

The resulting structure can be accessed with less nested fields:

>> D.n99.n9
ans =
  100.3616
   98.8023

Slightly better the original one, but still not easily traversed without some for-loops.

here is a related post that might be of interest: http://stackoverflow.com/a/4169216/97160 — Amro, Jun 07 '14 at 04:57

score -1 · Answer 2 · edited May 23 '17 at 11:44

-1

Since we often are "given" deeply nested structures from sources we can't control (other business units, customers, etc.), sometimes a baby's gotta do what a baby's gotta do. Here's a hack that seems to work to completely flatten a nested structure. Also posted to here just in case one of these questions gets deleted. . Copyright Carl Witthoft under usual GPL-3 rules.

%  struct2sims converter
function simout = struct2sims(structin)
fnam = fieldnames(structin);
for jf = 1:numel(fnam)
    subnam = [inputname(1),'_',fnam{jf}];
    if isstruct(structin.(fnam{jf}) ) ,
    % need to dive;  build a new variable that's not a substruct
     eval(sprintf('%s = structin.(fnam{jf});', fnam{jf}));
    eval(sprintf('simtmp = struct2sims(%s);',fnam{jf}) );
    % try removing the struct before getting any farther...
    simout.(subnam) = simtmp;
    else
    % at bottom, ok
    simout.(subnam) = structin.(fnam{jf});
    end

end
 %  need to unpack structs here, after each level of recursion
 % returns...
    subfnam = fieldnames(simout);
    for kf = 1:numel(subfnam)
         if isstruct(simout.(subfnam{kf}) ),  
             subsubnam = fieldnames(simout.(subfnam{kf}));
             for fk = 1:numel(subsubnam)
                 simout.([inputname(1),'_',subsubnam{fk}])...
                     = simout.(subfnam{kf}).(subsubnam{fk}) ;
             end
             simout = rmfield(simout,subfnam{kf});
         end
    end
 % if desired write to file with:
 % save('flattened','-struct','simout');
end

edited May 23 '17 at 11:44

Community

1
1

answered Aug 09 '16 at 18:56

Carl Witthoft

20,573
9
43
73

1

this doesn't exactly address the question above, still it may be useful if someone wanted to flatten nested structs like that... However your code is not very efficient plus it uses `eval` ugh! Here is a better implementation: http://pastebin.com/Dr8Kh3n7. Applying it on the example above `d = struct2sims(data)`, you get a completely flat structure with fields of the form `data_sets_n1_replicates_n1_X, ..., data_sets_n100_replicates_n10_Y` – Amro Aug 10 '16 at 08:16
2

word of caution, MATLAB has a [maximum length for variables names](http://www.mathworks.com/help/matlab/ref/namelengthmax.html) of 63, so deeply nested structs could get truncated when flattened this way... Perhaps you could also add an option to specify max depth in order to limit the recursion level (so the user specify a max depth level after which struct inputs are returned as-is without flattening them). For example with `depth=4` the output would be the fields `data_sets_n?_replicates_n?` each being a shallow struct with only `X` and `Y` – Amro Aug 10 '16 at 08:29
@Amro thanks for the upgrades and for the warnings. I'll try to get motivated :-) to implement the 'max depth' option and some name-length-checks. As it happens, this code was originally written for in-house use where our data sources never exceed three-deep. – Carl Witthoft Aug 10 '16 at 11:17
1

should be easy to implement the depth option. The idea is to add a third argument `function s_out = struct2sims(s_in, name, depth)` which defaults to 0 `if nargin < 3, depth = 0; end`, and you would increment it in the recursive call, like `s_tmp = struct2sims(val, subname, depth+1);`. You could then test this current depth is less than the specified limit, and if it exceeds it you stop recursion and return input struct as-is `s_out = struct(name,s_in); return;`. – Amro Aug 10 '16 at 12:08
1

@Amro I think I'd do it the other way: `struct2sims(s_in,name, Max)`, then recursively call with `Max-1` and stop when that reaches zero. – Carl Witthoft Aug 10 '16 at 15:20

I am looking for an elegant means of extracting nested data from a MATLAB data structure

2 Answers2