I have mnist training list in the following form:
def load_data():
f = gzip.open('mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = cPickle.load(f, encoding='latin1')
f.close()
def load_data_wrapper():
tr_d, va_d, te_d = load_data()
training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
training_data = list(zip(training_inputs, training_results))
........................................
Now I would like to preprocess my training inputs to have zero mean and unit variance. So I used from sklearn import preprocessing
in the following:
def SGD(self, training_data, epochs, mini_batch_size, eta,
test_data=None):
if test_data: n_test = len(test_data)
preprocessed_training = preprocessing.scale(training_data)
n = len(preprocessed_training)
for j in range(epochs):
random.shuffle(preprocessed_training)
mini_batches = [
training_data[k:k+mini_batch_size].....
....................
However, I'm getting the following error:
ValueError: setting an array element with a sequence.
I'm modifying code from mnielsen that can be found here. I'm new in python and machine learning in general. I would appreciate if anyone can help me out. Note: If you think there is a better library option then please let me know as well.
Update_1: This was my another try which gives the same error.
scaler = StandardScaler()
scaler.fit(training_data)
training_data = scaler.transform(training_data)
if test_data: test_data = scaler.transform(test_data)
Update_2: I tried the solution provided in the suggested answer using pandas dataframe but I am still getting the same error.
Update_3 : So it's object type but I need float type to perform scaler. I did the following: training_data = np.asarray(training_data).astype(np.float64)
and I still get the error!
Update_4 : General mnist dataset structure: 50k training images, 10k test images. In 50k images, each image is 28 * 28 pixels , which gives 784 data points. For example, a data point in MNIST, if it's original output is 5 then it's (array([ 0., 0., 0., ..., 0., 0., 0.], dtype=float32), 5)
tuple.You can see that first element in the tuple is a sparse matrix. Here is an example of the training dataset, first element of the tuple (i.e. the input image with 784 greyscaled floats). Along second element of the tuple, we just give output as a number 0 through 9. However, in one hot encoding, we give a 10D vector where all index values are zeros except for the index of the output value. So for number 5 it will be [[0],[0],[0],[0],[0],[1],[0],[0],[0],[0]]
. The wrapper modification that I'm using can be found here.