2

I am trying to vectorize the following data structure in Matlab but I cannot find/code an efficient way.
A = 1x2 struct array with fields: [a , b , c]
A(1) = a: 1 , b: 2 , c: [1x1 struct]
A(1).c = key: 5
A(2) = a: 1 , b: [] , c: [1x3 struct]
A(2).c = 1x3 struct array with fields: [key , key2]
A(2).c(1).key = 3
A(2).c(2).key = 4
A(2).c(3).key = 7
A(2).c(1).key2 = 10
A(2).c(2).key2 = []
A(2).c(3).key2 = 17

I know. This is a highly inefficient data structure. That's why I am trying to vectorize it with index, so the final structure will look like
A = 1x1 structure with fields [a , b , c , b_index , c_index]
A.a = [1 1]
A.b = [2]
A.b_index = [1]
A.c = 1x1 structure with fields [key key2 key2_index]
A.c_index = [1 2 2 2]
A.c.key = [5 3 4 7]
A.c.key2 = [10 17]
A.c.key2_index = [2 4]

My attempt 1: I've first tried parfor at each level (for this example, specifically: A, c, key 3 levels) with a survey first to see if it is empty, what data it contains, do I need to index this field. and then vertcat(x.(fieldname)) if it is not a structure leaf. But if it is, I've package it up as a cell and recursively push it down to be vectorized.

That works, but it unfortunately takes too long. When I did a profile on it, it showed the mex distribution function that's taking up all the time. I'm guessing that's because I am doing parfor at every level, hence MATLAB has to index and distribute to each worker very frequently at every level.

My attempt 2: I've tried to do a parfor survey of the structure completely first. Use a uint8 value for each field. And then at the combination stage, I use vertcat to check the survey results first to see if I need to index and if I need to do cat(3,...) for the data field. But that is memory inefficient and slow at the survey stage. And it doesn't speed up much at the combination stage. Though indexing becomes much easier.

I guess my questions are
1. How can I code it in a way that parfor only index and distribute the whole array once so my first attempt can be more efficient, or is my second attempt a better idea?
2. What is a good general approach to the problem?

Maroon66
  • 41
  • 3

1 Answers1

0

My two cents for your 2. question: Matlab's parfor works faster on simple arrays/matrices. This is due to the fact that arrays are allocated contagiously in memory and thus enable faster access and computation. So, instead of having complex structures, I would suggest using simpler arrays etc. if you're more concerned with the performance of your program and not with the readability.

xeroqu
  • 425
  • 5
  • 14
  • I see what you're saying, but I am stuck with what I am given. The crazy/inefficient structure I am trying to vectorize is given to me, not created by me. – Maroon66 Oct 27 '15 at 10:41
  • You could also use simple data structures to do the calculations and once you're done, you can convert those into this structure. – xeroqu Oct 27 '15 at 10:49
  • OK. Maybe I don't get what you're saying.. My situation is that I am given the complex crazy asymmetric partial empty structure and I want to convert to the simple structure of arrays so I can save and do calculation easily. – Maroon66 Oct 27 '15 at 12:14
  • Have you looked at `cell2mat`? – xeroqu Oct 27 '15 at 12:54
  • I've used cell2mat extensively in the recursive vectorization for my first attempt. And I still don't get what you're saying.. cell2mat won't do anything to the structure. – Maroon66 Oct 27 '15 at 13:13
  • I meant this: Instead of having `A` and `A.c` as structures, work directly with A's fields in your program. For instance, `A.c.key = [5 3 4 7]` becomes `key = [5 3 4 7]`, etc. Does this make sense? – xeroqu Oct 27 '15 at 15:07
  • Ah. I see what you're saying. I can go in, like you say.. to key and do calculation there, by using an arrayfun on the original A structure. However, the problem is that the example here is only the tip of an iceberg. The structure I'm given is 5-7 layers deep with huge amount of data. I can't even save in v7 format, and v7.3 will go up to 10-20GB of space. But if I convert everything to the simple structure, I can save quickly and do calculation. – Maroon66 Oct 27 '15 at 16:21
  • ... and you have already tried `struct2cell` I assume? Well, I'm not aware of a Matlab function that can do exactly what you're looking for. Maybe someone else can help further - or you may need to write your own function for conversion. Good luck! – xeroqu Oct 27 '15 at 17:29
  • Thanks. I stayed away from struct2cell because some struct will not be symmetrical, like the key in the example. In A(1), there is 1 key, in A(2), there are key and key2. So I have to keep track of the fieldname index for each cell if I were to use struct2cell. – Maroon66 Oct 27 '15 at 18:23