4

I want to use a pre-trained MXNet model on s390x architecture but it doesn't seem to work. This is because the pre-trained models are in little-endian whereas s390x is big-endian. So, I'm trying to use https://numpy.org/devdocs/reference/generated/numpy.lib.format.html which works on both little-endian as well as big-endian.

One way to solve this is to I've found is to load the model parameters on an x86 machine, call asnumpy, save through numpy Then load the parameters on s390x machine using numpy and convert them to MXNet. But I'm not really sure how to code it. Can anyone please help me with that?

UPDATE

It seems the question is unclear. So, I'm adding an example that better explains what I want to do in 3 steps -

  1. Load a preexisting model from MXNet, something like this -
net = mx.gluon.model_zoo.vision.resnet18_v1(pretrained=True, ctx=mx.cpu())
  1. Export the model. The following code saves the model parameters in .param file. But this .param binary file has endian issues. So, instead of directly saving the model using mxnet API, I want to save the parameters file using numpy - https://numpy.org/devdocs/reference/generated/numpy.lib.format.html. Because using numpy, would make the binary file (.npy) endian independent. I am not sure how can I convert the parameters of MXNet model into numpy format and save them.
gluon.contrib.utils.export(net, path="./my_model")
  1. Load the model. The following code loads the model from .param file.
net = gluon.contrib.utils.import(symbol_file="my_model-symbol.json",
                                     param_file="my_model-0000.params",
                                     ctx = 'cpu')

Instead of loading using the MXNet API, I want to use numpy to load .npy file that we created in step 2. After we have loaded the .npy file, we need to convert it to MXNet. So, I can finally use the model in MXNet.

Masquerade
  • 3,580
  • 5
  • 20
  • 37
  • Could you provide a minimal example that generates/saves/loads a model (without handling endian-ness)? – Han-Kwang Nienhuys Jul 12 '20 at 17:20
  • @Han-KwangNienhuys I want to load a preexisting gluon model in mxnet. Let's say we take the model to be resnet-50. So, I want to write the code for saving resnet-50 parameters as a numpy file (.npy). Then I want to import this .npy file to use resnet-50 model on another machine. I'm not sure how should I code this in Python. Can you help? Using a .npy extension would automatically resolve any endian issues automatically. So, a code that just generates/saves/loads a model would work without taking endianness into consideration. I'm not able to code this approach – Masquerade Jul 14 '20 at 14:53
  • https://stackoverflow.com/help/minimal-reproducible-example – Han-Kwang Nienhuys Jul 14 '20 at 14:56
  • @Han-KwangNienhuys I have added an example. Please let me know if you need any more information. – Masquerade Jul 14 '20 at 15:25
  • A reproducible example is something that can be copy-pasted for experimentation and works or demonstrates the problem without depending on your private files. – Han-Kwang Nienhuys Jul 14 '20 at 16:33
  • I think there is some misunderstanding. I don't know how to code my approach. For the solution, I don't want anyone to handle the endianness, it will automatically be handled by numpy, I believe. I want someone's help to code my approach that saves/loades model using Numpy. with MXNet I'm unable to write the code myself. – Masquerade Jul 14 '20 at 16:47
  • @Han-KwangNienhuys Maybe this question isn't clear. Please check this - https://stackoverflow.com/questions/62942031/save-load-mxnet-model-parameters-using-numpy – Masquerade Jul 17 '20 at 04:32
  • I suppose the whole `.params` file is big-endian? If so, have you considered just converting a big-endian to little-endian instead of reverse engineering `.params` file? You could even try to do it on the fly if `import` function supports byte-stream as an argument instead of file path. – pkuderov Jul 17 '20 at 08:28
  • @somebody Maybe you can include the two snippets for load/save in this question, given that this is the one with the bounty. – Han-Kwang Nienhuys Jul 17 '20 at 11:21
  • @pkuderov I think MXNet Python API doesn't support importing binary stream. How do you suggest I proceed with the binary conversion? I'm unable to figure out which part of the serialization process is messing up the binary. Only after I figure that out can I think of creating a converter. – Masquerade Jul 20 '20 at 10:45

1 Answers1

1

Starting from the code snippets posted in the other question, Save/Load MXNet model parameters using NumPy :

It appears that mxnet has an option to store data internally as numpy arrays:

mx.npx.set_np(True, True)

Unfortunately, this option doesn't do what it I hoped (my IPython session crashed).

The parameters are a dict of mxnet.gluon.parameter.Parameter instances, each of them containing attributes of other special datatypes. Disentangling this so that you can store it as a large number of pure numpy arrays (or a collection of them in an .npz file) is a hopeless task.

Fortunately, python has pickle to convert complex data structures into something more or less portable:

# (mxnet/resnet setup skipped)
parameters = resnet.collect_params()

import pickle
with open('foo.pkl', 'wb') as f:
    pickle.dump(parameters, f)

To restore the parameters:

with open('foo.pkl', 'rb') as f:
    parameters_loaded = pickle.load(f)

Essentially, it looks like resnet.save_parameters() as defined in mxnet/gluon/block.py gets the parameters (using _collect_parameters_with_prefix()) and writes them to a file using a custom write function which appears to be compiled from C (I didn't check the details).

You can save the parameters using pickle instead.

For loading, load_parameters (also in util.py) contains this code (with sanity checks removed):

for name in loaded:
    params[name]._load_init(loaded[name], ctx, cast_dtype=cast_dtype, dtype_source=dtype_source)

Here, loaded is a dict as loaded from the file. From examining the code, I don't fully grasp exactly what is being loaded - params seems to be a local variable in the function that is not used anymore. But it's worth a try to start from here, by writing a replacement for the load_parameters function. You can "monkey-patch" a function into an existing class by defining a function outside the class like this:

def my_load_parameters(self, ...):
   ... (put your modified implementation here)

mx.gluon.Block.load_parameters = my_load_parameters

Disclaimers/warnings:

  • even if you get save/load via pickle to work on a single big-endian system, it's not guaranteed to work between different-endian systems. The pickle protocol itself is endian-neutral, but if floating-point values (deep inside the mxnet.gluon.parameter.Parameter were stored as a raw data buffer in machine-endian convention, then pickle is not going to magically guess that groups of 8 bytes in the buffer need to be reversed. I think numpy arrays are endian-safe when pickled.
  • Pickle is not very robust if the underlying class definitions change between pickling and unpickling.
  • Never unpickle untrusted data.
Han-Kwang Nienhuys
  • 3,084
  • 2
  • 12
  • 31
  • Still not working, got the following - `Traceback: loaded_dict = pickle.load(f) File "/root/mxnet-1.5.0/mxnet/python/mxnet/ndarray/ndarray.py", line 386, in __setstate__ check_call(_LIB.MXNDArrayLoadFromRawBytes(ptr, length, ctypes.byref(handle))) File "/root/mxnet-1.5.0/mxnet/python/mxnet/base.py", line 253, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [15:34:40] include/mxnet/./tuple.h:354: Check failed: ndim >= -1 (-906324999 vs. -1) : ndim cannot be less than -1, received -906324999` Anyways thanks for your effort. – Masquerade Jul 20 '20 at 10:37
  • I see you're using mxnet 1.5.0. I installed mxnet 1.6 (via pip, with Python 3.7.6/numpy 1.18.1 in Anaconda 64-bit Linux) and the pickle dump/load cycle as I posted works. – Han-Kwang Nienhuys Jul 22 '20 at 06:42
  • Did you saved the parameters pickle file on a little-endian machine and then load it on big-endian machine? – Masquerade Jul 22 '20 at 07:09
  • Sorry, no, I did the dump/load roundtrip on the same machine. I'm afraid you'll have to patch the C code. – Han-Kwang Nienhuys Jul 22 '20 at 08:28