10

There are many ways to save a model and its weights. It is confusing when there are so many ways and not any source where we can read and compare their properties.

Some of the formats I know are:
1. YAML File - Structure only
2. JSON File - Structure only
3. H5 Complete Model - Keras
4. H5 Weights only - Keras
5. ProtoBuf - Deployment using TensorFlow serving
6. Pickle - Scikit-learn
7. Joblib - Scikit-learn - replacement for Pickle, for objects containing large data.

Discussion:
Unlike scikit-learn, Keras does not recommend you save models using pickle. Instead, models are saved as an HDF5 file. The HDF5 file contains everything you need to not only load the model to make predictions (i.e., architecture and trained parameters) but also to restart training (i.e., loss and optimizer settings and the current state).

What are other formats to save the model for Scikit-learn, Keras, Tensorflow, and Mxnet? Also what info I am missing about each of the above-discussed formats?

superduper
  • 401
  • 1
  • 5
  • 16
  • 1
    why are you mixing multiple libraries like keras and mxnet? h5 is pretty standard for keras/tensorflow 2.0 models. – Zabir Al Nazi Apr 09 '20 at 07:52
  • 2
    I wanted to make this thread have everything a fresher want to know when saving the model whether he is working on scikit-learn, tensorflow or mxnet. – superduper Apr 09 '20 at 07:55
  • don't think it's a knowledge forum or blog, this can not have a complete or meaningful answer whatsoever. questions should be specific so that people can answer that. voting to close. – Zabir Al Nazi Apr 09 '20 at 08:04
  • You seem to be using SO as if it was a forum, it is not, this is not a thread, but a question. You do not seem to be asking anything. – Dr. Snoopy Apr 09 '20 at 08:15
  • It's good to know which format would be most interoperable and perhaps asking for whether a standard exists within the ML community? – ptn77 Sep 02 '22 at 20:08

3 Answers3

2

There are also formats like onnx which basically supports most of the frameworks and helps in removing the confusion of using different formats for different frameworks.

divyank
  • 31
  • 4
1

There exists also TFJS format, which enables you to use the model on web or node.js environments. Additionally, you will need TF Lite format to make inference on mobile and edge devices. Most recently, TF Lite for Microcontrollers exports the model as a byte array in C header file.

monatis
  • 534
  • 4
  • 8
1

Your question on formats for saving a model has multiple possible answers, based on why you want to save your model:

  1. Save your model to resume training it later
  2. Save your model to load it for inference later

These scenarios give you a couple of options:

You could save your model using the library-specific saving functions; if you want to resume training, make sure that you have saved all the information you need to really be able to resume training. Formats here will vary by library, and indeed are not aimed at being formats that you would inspect or read in any way - they are just files. If you are looking for a library that wraps all of these save functions behind a common API, you should check out the modelstore Python library.

You can also want to use a common format like ONNX; there are converters from Keras to ONNX and scikit-learn to ONNX available; but it is uncommon to use this format to later resume training. The benefit here is that they are all saved to a common format, which may streamline the process of loading them later.

neal
  • 343
  • 3
  • 10