Save ordered non adjacent big data

Question

I have a question. I think one of the answers is "don't use matlab" but I'd prefer finding other ones as well.

I have an unknown amount of data sets. Each set is specified by a number id (which is a positive integer). I don't chose the id. I want to sort these data sets by id names, however the ids don't follow each other. for example

[3 9 17 35 69 101]

When I get a new data set from my data stream, I can get an id wich can take any value, for example 19. I want to insert it of course between "17" and "35".

One of the dirty solution is to make a cell of data sets. If N is my total number of data sets so far and Ind is the index of one data set. If a get a (N+1)th data set that I want to insert in my cell with an id which falls right in the middle, I need to move up all of the data sets with a bigger index. You'll assume this is not very efficient.

Then I remembered one of the rare classes of programming I had (I'm a physicist). It was dealing with linked-lists. The solution here is simple, I just need to change the pointer of the previous set to my new data set and add a pointer towards the next one: much more efficient.

There is no built-in matlab function for it but there is this class example called dlnode. By searching online a bit more on this topic, my first hit was sadly this: http://abandonmatlab.wordpress.com/category/thirty-misfeature-pileup/ Basically, this person shows that making a double list of 510 elements and then clearing the two lists would produce a bunch of warnings. I tried it myself and I got a bunch of warnings by clearing the 2 variables. I tried also to save the variables (right click > save as): it worked for 510 sets in the linked list but would crash Matlab for 10000 (note: the data saved here per set is only 1 numeric value, in my case it is a thousand of characters per set -- and I expect 10000 sets minimum). So linked-lists doesn't seem to be the solution for Matlab.

Have you already faced this problem? Could you come up with a more efficient solution?

Thanks for your help.

Edit: I think I need to precise more. I have data sets looking like that:

id: 89 %positive integer as id
data: 'xxxxxx'%several lines of strings

I have a stream of data where an id is specified. I have no control on this id, I just know that can be only positive integers. id coming in can either be smaller, bigger or even equal to an id I already got before.

in the case I get a same id, the data are different. I simply want to add the lines of strings (say "yyyyyy") attached to it to my previously saved data

id: 89 %positive integer as id
data: 'xxxxxxyyyyyy'%several lines of strings

But this is not the hard part.

What is hard is that I want to sort them in some kind of data structure by increasing ids so that it is much easier to find again my data later. But I still want that my code is efficient to add new data set to the whole set. Preallocation of a cell is possible but doesn't solve the problem of getting a data set with an id smaller than the bigger id obtained so far (which requires to change the index of all of these data sets).

A solution I'm thinking of is to append data without caring whether the new id I get is bigger or small...

score 1 · Accepted Answer · edited May 23 '17 at 10:26

1

Matlab use do not? =)

I'm currently helping physicists transition from Matlab to Python. In python I'd use a dictionary to keep track of your datasets, so I did a quick search for "dictionaries in Matlab" and found "How to use Hash Tables (dictionaries) in MATLAB?" which may work for you.

EDIT: Link to the question my original link was a duplicate of: "How to use Hash Tables (dictionaries) in MATLAB?"

edited May 23 '17 at 10:26

Community

1
1

answered Mar 26 '14 at 08:42

physicsmichael

4,793
11
35
54

Hello, One of the solutions given in this topic is to use containers.Map . I already thought of using that. But when I saw that, i thought it was also a classdef based on handle (like dlnode), so I got a bit squared... Perhaps I should not because it seems to be a built-in method. I'll try that and keep you posted. – hyamanieu Mar 26 '14 at 10:09
Hello, I just tried to load "tons" of data with map containers: it was definitely not the bottle neck. And it seems to be a hash table indeed. Hash table was definitely the answer, now I remember I had also a lecture about it ^^ . Such a long time ago... Thanks :) – hyamanieu Mar 29 '14 at 06:56

Save ordered non adjacent big data

1 Answers1