13

Calling MATLAB from Python is bound to give some performance reduction that I could avoid by rewriting (a lot of) code in Python. However, this isn't a realistic option for me, but it annoys me that a huge loss of efficiency lies in the simple conversion from a numpy array to a MATLAB double.

I'm talking about the following conversion from data1 to data1m, where

data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
data1m = matlab.double(list(data1))

Here matlab.double comes from Mathworks own MATLAB package / engine. The second line of code takes 20 s on my system, which just seems like too much for a conversion that doesn't really do anything other than making the numbers 'edible' for MATLAB.

So basically I'm looking for a trick opposite to the one given here that works for converting MATLAB output back to Python.

kgf3JfUtW
  • 13,702
  • 10
  • 57
  • 80
5Ke
  • 1,209
  • 11
  • 28
  • I'm using Matlab 2022a with Python 3.9 and the problem seems to be non-existent - i.e. one can convert from Numpy array to Matlab double without requiring any list. – Des Jun 15 '22 at 12:55

3 Answers3

10

Passing numpy arrays efficiently

Take a look at the file mlarray_sequence.py in the folder PYTHONPATH\Lib\site-packages\matlab\_internal. There you will find the construction of the MATLAB array object. The performance problem comes from copying data with loops within the generic_flattening function.

To avoid this behavior we will edit the file a bit. This fix should work on complex and non-complex datatypes.

  1. Make a backup of the original file in case something goes wrong.

  2. Add import numpy as np to the other imports at the beginning of the file

  3. In line 38 you should find:

    init_dims = _get_size(initializer)
    

    replace this with:

    try:
        init_dims=initializer.shape
    except:
        init_dims = _get_size(initializer)
    
  4. In line 48 you should find:

    if is_complex:
        complex_array = flat(self, initializer,
                             init_dims, typecode)
        self._real = complex_array['real']
        self._imag = complex_array['imag']
    else:
        self._data = flat(self, initializer, init_dims, typecode)
    

    Replace this with:

    if is_complex:
        try:
            self._real = array.array(typecode,np.ravel(initializer, order='F').real)
            self._imag = array.array(typecode,np.ravel(initializer, order='F').imag)
        except:
            complex_array = flat(self, initializer,init_dims, typecode)
            self._real = complex_array['real']
            self._imag = complex_array['imag']
    else:
        try:
            self._data = array.array(typecode,np.ravel(initializer, order='F'))
        except:
            self._data = flat(self, initializer, init_dims, typecode)
    

Now you can pass a numpy array directly to the MATLAB array creation method.

data1 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
#faster
data1m = matlab.double(data1)
#or slower method
data1m = matlab.double(data1.tolist())

data2 = np.random.uniform(low = 0.0, high = 30000.0, size = (1000000,)).astype(np.complex128)
#faster
data1m = matlab.double(data2,is_complex=True)
#or slower method
data1m = matlab.double(data2.tolist(),is_complex=True)

The performance in MATLAB array creation increases by a factor of 15 and the interface is easier to use now.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
max9111
  • 6,272
  • 1
  • 16
  • 33
  • Thanks for this suggestion! It seems really promising as conversion time gets reduced from 40 s to 0.6 s. However, when I use them as input, I now get a `Segmentation fault (core dumped)` error. Stepping through shows that while calling the function (rather than the conversion), `future = pythonengine.evaluateFunction(...)` [Line 77 in matlabengine.py], __init__ of _MLArrayMetaClass gets called again, and now it trips over the altered line 38: AtributeError: 'double' object has no attribute 'shape'. Maybe it tries to initialize the function's output here? – 5Ke Jul 25 '17 at 07:36
  • I have updated my code now. At least with something like b=engine.sqrt(data2m) it is working now. – max9111 Jul 25 '17 at 11:45
  • Yes, now it works! Your method reduces the total conversion time from 40 s to less than 0.5 s! :-) It doesn't reduce computational time of the script itself - which makes complete sense, but it again makes me wonder why saving/loading to a .mat file does. – 5Ke Jul 26 '17 at 07:40
  • How do you measure computational time of the script? Do you measure the time on the matlab or on the python side? Note that only the array conversion to a Matlab mlarray-object was improved, not the transfer to or from the matlab engine itself (if they make a copy here instead passing a pointer). – max9111 Jul 26 '17 at 11:15
  • In both cases I measure the time of the matlab script from Python, so that it includes the transfer. Wouldn't saving it to a .mat file be more like making a copy than like making a pointer too? – 5Ke Jul 26 '17 at 14:13
  • 2
    Yes, but obviously a more efficient one ;). Please note also: If you are saving and loading a relatively small mat file, the matfile is cached in memory by your working system, so there is no overhead in disk I/O. If the file get's larger this can look a bit different. I could't look into the compiled interface code, but if Matworks did a similar job as in the python interface, I am not suprised that saving and loading is more efficient. – max9111 Jul 26 '17 at 14:31
  • @max9111 thank you for this interesting answer! Could you also mention (or better yet, add) the modifications that are required for passing complex arrays? – Dev-iL Oct 07 '18 at 13:56
  • @Dev-iL I do not have a valid Matlab license right now. It shouldn't be too difficult (just an extraction of the real and imaginary part) I can give a guess how it works, but somebody has to test it... – max9111 Oct 07 '18 at 17:09
  • @max9111 I will happily test this. Please contact me on [SO Chat](https://chat.stackoverflow.com/rooms/81987/chatlab-and-talktave) if you get a chance to work on this. – Dev-iL Oct 07 '18 at 18:17
  • @max9111 I have modified the code as you mentioned, but now I have an error at the init_dims = _get_size(initializer); the error is : ValueError: initializer must be a rectangular nested sequence – JCV Jun 16 '20 at 22:14
  • Hello, I have change the code for the one above and now I have an error in the mlarray_utils.py function when passing a scalar, the problem is I forgot to make a copy of the original file. Would someone send me the original file? Or someone knows how to fix this problem? File "C:\ProgramData\Anaconda3\lib\site-packages\matlab\_internal\mlarray_utils.py", line 90, in _normalize_size if init_dims[0] == 0: IndexError: tuple index out of range – JCV Jun 22 '20 at 11:42
  • If `data1` is a 2D array then `matlab.double(data1.tolist())`is inevitable – seralouk Nov 01 '21 at 23:19
  • Indeed, the `matlab.double` is much faster after these modification however, in my case, the actual execution of the matlab function using the engine is EXTREMELY slow. I found this (https://stackoverflow.com/a/45284125/5025009) helpful. – seralouk Nov 02 '21 at 07:08
4

While awaiting better suggestions, I'll post the best trick I've come up with so far. It comes down to saving the file with `scipy.io.savemat´ and then loading this file in MATLAB.

This is not the prettiest hack and it requires some care to ensure different processes relying on the same script don't end up writing and loading each other's .mat files, but the performance gain is worth it for me.

As a test case I wrote two simple, almost identical MATLAB functions that require 2 numpy arrays (I tested with length 1000000) and one int as input.

function d = test(x, y, fs_signal)
d = sum((x + y))./double(fs_signal);

function d = test2(path)
load(path)
d = sum((x + y))./double(fs_signal);

The function test requires conversion, while test2 requires saving.

Testing test: Converting the two numpy arrays takes cirka 40 s on my system. The total time to prepare for and run test comes down to 170 s

Testing test2: Saving the arrays and int takes cirka 0.35 s on my system. Suprisingly, loading the .mat file in MATLAB is extremely efficient (or more suprisingly, it is extremely ineffcient at dealing with its doubles)... The total time to prepare for and run test2 comes down to 0.38 s

That's a performance gain of almost 450x...

5Ke
  • 1,209
  • 11
  • 28
  • Perhaps writing your own C++ code may help. Converting data from python to C++ shoudl be quite easy with e.g. cython, and then you can use MATLABs mex API to create a MATLAB variable and assign the same memory pointer as the python (now C++) data. Both of these are certainly very fast (as it is just creating objects and assigning pointers) and should be a more elegant solution than relaying on IO. – Ander Biguri Jul 24 '17 at 15:42
  • Maybe this would help: https://github.com/kmatzen/matlab-python It's a wrapper for the matlab C interface which should give decent speed. – max9111 Jul 24 '17 at 19:18
  • Going over to C++ is a bit too daunting for now, although Cython definitely looks interesting. I guess it depends on the payback on the effort to implement this. Is there any chance that the matlab functions themselves will show improved performance too when switching to the mex API? – 5Ke Jul 25 '17 at 08:54
  • @max9111: Link is dead. – Eric Nov 06 '19 at 10:52
  • This is the only answer that really made the execution time go down. All the others are about optimizing loading and passing arrays but the point is that the matlab engine is slow by definition. This made the trick thanks – seralouk Nov 02 '21 at 07:53
3

My situation was a bit different (python script called from matlab) but for me converting the ndarray into an array.array massively speed up the process. Basically it is very similar to Alexandre Chabot solution but without the need to alter any files:

#untested i.e. only deducted from my "matlab calls python" situation
import numpy
import array

data1 = numpy.random.uniform(low = 0.0, high = 30000.0, size = (1000000,))
ar = array.array('d',data1.flatten('F').tolist())
p = matlab.double(ar)
C = matlab.reshape(p,data1.shape) #this part I am definitely not sure about if it will work like that

At least if done from Matlab the combination of "array.array" and "double" is relative fast. Tested with Matlab 2016b + python 3.5.4 64bit.

Christian B.
  • 816
  • 6
  • 11
  • 1
    I can confirm that this method is x3-x5 faster than `double(py.array.array('d', py.numpy.nditer(data1)))` for transferring data from python to MATLAB. Way to go! +1. BTW, do you have an idea if and how it's possible to transfer python arrays to MATLAB without copying memory (as if passing a pointer)? – Dev-iL Oct 07 '18 at 20:43
  • @Dev-iL is this what you are looking for? I am still trying to get admin to install the MATLAB API into my `conda` environment so I can really speed things up, but this works for now: https://www.mathworks.com/matlabcentral/answers/216498-passing-numpy-ndarray-from-python-to-matlab#answer_487604 – brethvoice Aug 31 '20 at 20:21
  • 1
    @brethvoice I mostly work in MATLAB and I'm using some version of the [`matpy`](https://github.com/CoolProp/CoolProp/blob/master/wrappers/MATLAB/matpy.m) class to pass data back and forth. Of course, if you or anyone else have ideas on how to increase its performance, it would be greatly appreciated :) – Dev-iL Sep 01 '20 at 04:04
  • @Dev-iL I hypothesize that transforming the data into MATLAB's expected form *before* passing it back to MATLAB from Python will speed things up. That requires you to install and use the matlab package from within Python though; takes admin privileges so I have not done it yet. It may not work but this web page makes it look like it might: https://www.mathworks.com/help/compiler_sdk/python/matlab-arrays-as-python-variables.html – brethvoice Sep 01 '20 at 19:04
  • error: `ValueError: initializer must be a rectangular nested sequence` – ch271828n Sep 22 '21 at 01:05