0

Does MATLAB have a data structure to which elements can be added and removed without copying all of data?

AFAIK it lacks a list (like in Python), so I'm trying to implement it, but unsure what to build it on. I figured cell array, but the linter suggests it's actually an array. One could implement it the long way where each element is a separate class, using handle classes as containers as pointers, but I'm guessing that's not efficient in MATLAB.

struct? Extend a cell array each time its length is exceeded? More important than not copying the list is to not copy its contents, namely arrays.

Edit: I implemented Pythonic list and dict based on cellarray and containers.Map, here.

OverLordGoldDragon
  • 1
  • 9
  • 53
  • 101
  • 1
    What do you want to put in the list? More recent versions of MATLAB double the allocated space for arrays when appending individual elements in a loop. See here: https://stackoverflow.com/q/48351041/7328782 – Cris Luengo Aug 30 '22 at 01:15
  • I’m thinking this *might* answer your question, but I’m not sure: https://stackoverflow.com/q/50257074/7328782 – Cris Luengo Aug 30 '22 at 01:16
  • @CrisLuengo Lists should be able to store almost anything, but let's say numeric and string arrays. My peak concern is copy behavior: manipulating the _list_ should not copy its elements. I'm aware of your first post, and second post is my [current implem](https://pastebin.com/HtZ5ffCf), but I'm uncertain about `lst{end + 1} = x`. Is there a reference page for every case where MATLAB _doesn't_ copy? – OverLordGoldDragon Aug 30 '22 at 19:49
  • 1
    MATLAB never copies! Well, sort of. MATLAB uses "lazy copy": `b=a` creates a copy of matrix `a`, but without copying the data. Only when you modify either `a` or `b` is the copy made. So doing `c = {a,b}` puts `a` and `b` in a cell array, but doesn't copy any data. The cell array is just pointing to the same matrices as the variables `a` and `b` (until you attempt to modify one of those arrays). `c{end+1} = x` might reallocate the array `c`, but the array `c` is just an array of pointers, so this should be cheap. It will never cause a copy of the matrices in the cell array. – Cris Luengo Aug 30 '22 at 20:41
  • 1
    In your implementation, I would have `extend(self, x)` first assign to `self.data{end+length(x)}`, so that you don't repeatedly change the size of the array. I would also use `numel` instead of `length` everywhere, because `length` is more expensive (it's defined as `max(size(.))`). And `size` should take more input arguments. You can do `function out = size(self, varargin); out = size(self.data, varargin{:}); end`. Oh, and it also takes multiple output arguments. You want to make sure it behaves like [the builtin method](https://www.mathworks.com/help/matlab/ref/size.html). – Cris Luengo Aug 30 '22 at 20:47
  • @CrisLuengo This extends my understanding based on [this](https://www.mathworks.com/help/matlab/matlab_prog/avoid-unnecessary-copies-of-data.html), good to confirm. Thanks also for taking look at my code: I overloaded all "size-ness" operators just for whatever calls on them; good catch on `extend`. I'm thinking redoing all that using `dlnode` for sake of long lists. – OverLordGoldDragon Aug 30 '22 at 20:51
  • 1
    Make sure you time your use cases with both implementations. I have the gut feeling that `dlnode` is orders of magnitude slower than a simple cell array. Even in C++ it is often better to use a plain array over a linked list, in MATLAB this difference must be much more significant. I might be wrong, don't take my word for it, time your code! – Cris Luengo Aug 30 '22 at 21:14
  • @CrisLuengo I certainly had this concern, also on how big an "empty" MATLAB class is (ones in Python come with lots of baggage, don't want `1e7` instances of those). I might still be a little paranoid on copies and `dlnode` appears safer. May be best to just roll with cellarray and revisit if it disappoints. Briefly, do your copy descriptions also hold of [dlarray](https://www.mathworks.com/help/deeplearning/ref/dlarray.html) packed in cellarray? – OverLordGoldDragon Aug 30 '22 at 21:24
  • 1
    Yes, MATLAB does the lazy copy for everything. – Cris Luengo Aug 30 '22 at 21:58
  • @CrisLuengo I made an implementation for list and dict [here](https://github.com/OverLordGoldDragon/PyInMat), thanks for the help so far. Any further feedback is welcome and appreciated. -- Also you may notice a strange code structure (emulating Python imports), I thought of making an [SE](https://softwareengineering.stackexchange.com/) post, but dunno if there's much to be said - [in one pic](https://i.stack.imgur.com/IIWNL.png). – OverLordGoldDragon Sep 02 '22 at 02:40

1 Answers1

1

Mathworks themselves have an example linked-list implementation, where each elements seemed to be an instance of a class. It's bound to be less efficient than a uniform array in many operations, but could be ideal for many others.

Alternatively, you can relay on Java LinkedLists like discussed in here. Whether performance improves depends on the actual usage scenario.

I guess the most efficient way will be using MEX function to call C routines, save for hardware specific optimizations. That's likely to be far-fetched for most applications that starts in MATLAB.

X Zhang
  • 1,081
  • 7
  • 17
  • Yeah, that example is what I had in mind. If they're doing that then maybe it's practical. Java appears [limited](https://stackoverflow.com/q/14485432/10133797) in handling MATLAB types. I'll give it a go, thanks. – OverLordGoldDragon Aug 30 '22 at 20:21