1

How do I store Matlab arrays located in a 'struct within struct within struct' into a database so that I can then retrieve the fields and arrays?

More detail on why do I need this below:

I have tons of data saved as .mat files....the hassle is that I need to load a complete .mat file to begin manipulating and plotting the data there. If that file is large, it becomes quite a task just to load it into memory.

These .mat files are resulted from the analysis of raw electrical measurement data of transistors. All .mat files have the same structure but each file correspond to a different and unique transistor.

Now say I want to compare a certain parameter in all transistors that are common in A and B, I have to manually search and load all the .mat files I need and then try to do the comparison. There is no simple way to merge all of these .mat files into a single .mat file (since they all have the same variable names but with different data). Even if that is possible, there is no way I know of to query specific entries from .mat files.

I do not see a way of easily doing that without a structured database from which I can query specific entries. Then I can use any programming language (continue with Matlab or switch to python) to convieniently do the comparison and plotting...etc. without the hassle of the scattered .mat files.

Problem is that the data in the .mat files are structured in structs and large arrays. From what I know, storing that in a simple SQL database is not a straight forward task. I looked up using HDF5 but from the examples I saw, I have to do a lot of low-level commands to store those structs in an HDF file and I am not sure if I can load parts of the HDF file into Matlab/python or if I also have to load the whole file in memory first.

The goal here is to merge all existing (and to-be-created) .mat files (with their compound data strucutre of structs and arrays) into a single database file from which I can query specific entries. Is there a database solution that can preserve the structure of my complex data? Is HDF the way to go? or is there a simple solution I am missing?

EDIT:

Example on data I need to save and retrieve:

All(16).rf.SS(3,2).data

Where All is an array of structs with 7 fields. Each struct in the rf field is a struct with arrays, integers, strings and structs. One of those structs is named SS which in turn is an array of structs each containing a 2x2 array named data.

Ahmad Khaled
  • 387
  • 3
  • 17
  • 1
    Outside the Matlab world, this is known as an ORM, or Object Relation Mapper. – MSalters Apr 29 '19 at 10:49
  • Thanks @MSalters! Looked it up, couldn't find something related to Matlab....Would it be logical to import the .mat files into python then saving it with Object Oriented Databases? Then continue using python from there? – Ahmad Khaled Apr 29 '19 at 11:08
  • Not an expert here, but it makes sense. Python (with SciPy/NumPy) is serious competition for MatLab, precisely because it integrates better with the rest of the world. – MSalters Apr 29 '19 at 11:20
  • True...I am bound to the .mat files though for now...I will lookup a. How to import .mat files into python and b. How to save those imported .mat files with a python ORM...If I can get both right, that would solve the problem. – Ahmad Khaled Apr 29 '19 at 11:24
  • Can you show us an example of the 'struct within struct within struct' data? Going to Python may make it possible to store and retrieve the .mat files but it doesn't mean you can search and filter on the values of fields within those files. If it's important that you can do that, you may be better off focusing on reorganising the data in MATLAB so that you can either use native data structures (see my answer) or get it into a database-friendly table format. – nekomatic Apr 29 '19 at 11:39
  • Indeed, filtering on the different fields is what I need. I am assuming the ORM solution enables that? I added an example on the data I need to store and fetch to the question. – Ahmad Khaled Apr 29 '19 at 11:51
  • OK, I'll rephrase: the Python/ORM solution may let you search and filter the data as you want (I'm not expert in that area) but I suspect you will still need to do significant work reorganising the data in Python in order to enable it. I suggest you start by experimenting with reading your .mat data into Python and deciding whether it looks easier to continue there or just reorganise it all within MATLAB. – nekomatic Apr 29 '19 at 12:24

1 Answers1

2

Merge .mat files into one data structure

In general it's not correct that There is no simple way to merge ... .mat files into a single .mat file (since they all have the same variable names but with different data).

Let's say you have two files, data1.mat and data2.mat and each one contains two variables, a and b. You can do:

>> s = load('data1')
s = 
  struct with fields:

    a: 'foo'
    b: 3

>> s(2) = load('data2')
s = 
  1×2 struct array with fields:
    a
    b

Now you have a struct array (see note below). You can access the data in it like this:

>> s(1).a
ans =
    'foo'

>> s(2).a
ans =
    'bar'

But you can also get all the values at once for each field, as a comma-separated list, which you can assign to a cell array or matrix:

>> s.a
ans =
    'foo'
ans =
    'bar'

>> allAs = {s.a}
allAs =
  1×2 cell array
    {'foo'}    {'bar'}

>> allBs = [s.b]
allBs =
     3     4

Note: Annoyingly, it seems you have to create the struct with the correct fields before you can assign to it using indexing. In other words

s = struct;
s(1) = load('data1')

won't work, but

s = struct('a', [], 'b', [])
s(1) = load('data1')

is OK.

Build an index to the .mat files

If you don't need to be able to search on all of the data in each .mat file, just certain fields, you could build an index in MATLAB containing just the relevant metadata from each .mat file plus a reference (e.g. filename) to the file itself. This is less robust as a long-term solution as you have to make sure the index is kept in sync with the files, but should be less work to set up.

Flatten the data structure into a database-compatible table

If you really want to keep everything in a database, then you can convert your data structure into a tabular form where any multi-dimensional elements such as structs or arrays are 'flattened' into a table row with one scalar value per (suitably-named) table variable.

For example if you have a struct s with fields s.a and s.b, and s.b is a 2 x 2 matrix, you might call the variables s_a, s_b_1_1, s_b_1_2, s_b_2_1 and s_b_2_2 - probably not the ideal database design, but you get the idea.

You should be able to adapt the code in this answer and/or the MATLAB File Exchange submissions flattenstruct2cell and flatten-nested-cell-arrays to suit your needs.

nekomatic
  • 5,988
  • 1
  • 20
  • 27
  • It is possible to do in Matlab indeed....I have been doing exactly that so far. However, when you have hundreds of .mat files and just want to compare data among 4 or 5 of them....I can either load them all (memory explodes) or go and pick the ones I need (which is time-consuming since I have to go back and check the specs of each manually and if I need it for the comparison). That is why I have been thinking of a solution to query specific entries based on certain conditions. I do not know a way of doing that with .mat files. – Ahmad Khaled Apr 29 '19 at 11:57
  • 1
    Will you potentially need to search on any of the data in each .mat file, or is it only certain fields? If the latter, you could build an index in MATLAB containing just the relevant metadata from each .mat file plus a reference (e.g. filename) to the file itself. – nekomatic Apr 29 '19 at 12:26
  • Will need to search certain fields to filter, then acquire other fields for data. The index + reference idea sounds great and less work to do: Having one .mat file with all metadata required for filtering and a reference indicating the path to the corresponding mat file to fetch the data afterwards. I like it (: Thanks! – Ahmad Khaled Apr 29 '19 at 12:45
  • I am just confused that there isn't any solution out there for storing structured arrays of data into a database that one can run selective queries on. It is quite needed in research. Or is that only Matlab lacks such a feature? – Ahmad Khaled Apr 29 '19 at 12:48
  • Edited my answer with the index + reference suggestion. It's quite likely there are solutions for this sort of application that I don't know of, but I guess there's a limited range of data for which they would be useful: if the data structure is predictable it's not hard to organise it into table form for a standard database, but if it's highly variable then I guess it becomes hard to index in a meaningful way. I don't think MATLAB is great for data input and output though. – nekomatic Apr 29 '19 at 13:39
  • Thanks @nekomatic! My data structure is fixed, yet I am not able to use a standard database (I do not know of a way to store the structs and arrays there). I am trying both solutions (ORM/python and your workaround) in parallel..Will mark your workaround as 'accepted' in a couple of days if there is no direct answer to my question. – Ahmad Khaled Apr 30 '19 at 08:33