I am trying to vectorize the following data structure in Matlab but I cannot find/code an efficient way.
A = 1x2 struct array with fields: [a , b , c]
A(1) = a: 1 , b: 2 , c: [1x1 struct]
A(1).c = key: 5
A(2) = a: 1 , b: [] , c: [1x3 struct]
A(2).c = 1x3 struct array with fields: [key , key2]
A(2).c(1).key = 3
A(2).c(2).key = 4
A(2).c(3).key = 7
A(2).c(1).key2 = 10
A(2).c(2).key2 = []
A(2).c(3).key2 = 17
I know. This is a highly inefficient data structure. That's why I am trying to vectorize it with index, so the final structure will look like
A = 1x1 structure with fields [a , b , c , b_index , c_index]
A.a = [1 1]
A.b = [2]
A.b_index = [1]
A.c = 1x1 structure with fields [key key2 key2_index]
A.c_index = [1 2 2 2]
A.c.key = [5 3 4 7]
A.c.key2 = [10 17]
A.c.key2_index = [2 4]
My attempt 1:
I've first tried parfor at each level (for this example, specifically: A, c, key 3 levels) with a survey first to see if it is empty, what data it contains, do I need to index this field. and then vertcat(x.(fieldname)) if it is not a structure leaf. But if it is, I've package it up as a cell and recursively push it down to be vectorized.
That works, but it unfortunately takes too long. When I did a profile on it, it showed the mex distribution function that's taking up all the time. I'm guessing that's because I am doing parfor at every level, hence MATLAB has to index and distribute to each worker very frequently at every level.
My attempt 2:
I've tried to do a parfor survey of the structure completely first. Use a uint8 value for each field. And then at the combination stage, I use vertcat to check the survey results first to see if I need to index and if I need to do cat(3,...) for the data field. But that is memory inefficient and slow at the survey stage. And it doesn't speed up much at the combination stage. Though indexing becomes much easier.
I guess my questions are
1. How can I code it in a way that parfor only index and distribute the whole array once so my first attempt can be more efficient, or is my second attempt a better idea?
2. What is a good general approach to the problem?