0

I performed data augmentation on the MNIST dataset of the sklearn library. Now I want to save the augmented dataset to file, since its computation is quite long.

I want to save it in a format similar to the original MNIST which is of type sklearn.utils.Bunch or a dictionary so that the general form X, y = mnist['data'], mnist['target']; to retrieve the data is preserved,

How can I do that?

import matplotlib;
import matplotlib.pyplot as plt;
from sklearn.datasets import fetch_openml;
mnist = fetch_openml("mnist_784");
X, y = mnist['data'], mnist['target'];
y = y.astype(int);
....
X_augmented, y_augmented = expand_dataset(X,y);
data_augmented = {"data": X_train_augmented, "target": y_train_augmented};
How to save to file?

I tried something like

import json
f = open("MNIST_augmented","w");
json.dump(data_augmented, f);
f.close();

But I get the error

TypeError: Object of type ndarray is not JSON serializable

roschach
  • 8,390
  • 14
  • 74
  • 124

1 Answers1

0

The issue is not specific to MNIST. If you want to store ndarray data as JSON, then you will have to do a bit more pre-processing. See here - NumPy array is not JSON serializable

Otherwise, you should be able to use numpy.save() or pickle directly with your dictionary. https://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html https://wiki.python.org/moin/UsingPickle

Jayant Sahewal
  • 561
  • 6
  • 16