Wrapper for Keras Model in Spark

Question

I have a Keras Nueral Network and I want to deploy this model using an wrapper in the spark environment. So I tried the following tutorial here

import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Input, Dense, Conv1D, Conv2D, MaxPooling2D, Dropout,Flatten
from keras import backend as K
from keras.models import Model
import numpy as np
import matplotlib.pyplot as plt


from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()


# Expect to see a numpy n-dimentional array of (60000, 28, 28)

type(X_train), X_train.shape, type(X_train)


#This time however, we flatten each of our 28 X 28 images to a vector of 1, 784

X_train = X_train.reshape(-1, 784)
X_test = X_test.reshape(-1, 784)

# expect to see a numpy n-dimentional array of : (60000, 784) for Traning Data shape and (10000, 784) for Test Data shape
type(X_train), X_train.shape, X_test.shape


#We also use sklearn's MinMaxScaler for normalizing

from sklearn.preprocessing import MinMaxScaler
def scaleData(data):
    # normalize features
    scaler = MinMaxScaler(feature_range=(0, 1))
    return scaler.fit_transform(data)

X_train = scaleData(X_train)
X_test = scaleData(X_test)


# We define the same Keras model as earlier

input_shape = (1,28,28) if K.image_data_format() == 'channels_first' else (28,28, 1)
keras_model = Sequential()
keras_model.add(Conv2D(32, kernel_size=(5, 5), activation='relu', input_shape=input_shape, padding='same'))
keras_model.add(MaxPooling2D(pool_size=(2, 2)))
keras_model.add(Conv2D(64, (5, 5), activation='relu', padding='same'))
keras_model.add(MaxPooling2D(pool_size=(2, 2)))
keras_model.add(Flatten())
keras_model.add(Dense(512, activation='relu'))
keras_model.add(Dropout(0.5))
keras_model.add(Dense(10, activation='softmax'))
keras_model.summary()


# Import the Keras to DML wrapper and define some basic variables

from systemml.mllearn import Keras2DML
epochs = 5
batch_size = 100
samples = 60000
max_iter = int(epochs*math.ceil(samples/batch_size))

# Now create a SystemML model by calling the Keras2DML method and feeding it your spark session, Keras model, its input shape, and the  # predefined variables. We also ask to be displayed the traning results every 10 iterations.

sysml_model = Keras2DML(spark, keras_model, input_shape=(1,28,28), weights='weights_dir', batch_size=batch_size, max_iter=max_iter, test_interval=0, display=10)

# Initiate traning. More spark workers and better machine configuration means faster training!

sysml_model.fit(X_train, y_train)

# Test your model's performance on the secluded test set, and re-iterate if required
sysml_model.score(X_test, y_test)

At the line from systemml.mllearn import Keras2DML The error I got is

Traceback (most recent call last): File "d:/SparkJarDirectory/./NNSpark.py", line 58, in from systemml.mllearn import Keras2DML File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\systemml\mllearn__init__.py", line 45, in from .estimators import * File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\systemml\mllearn\estimators.py", line 917 def init(self, sparkSession, keras_model, input_shape, transferUsingDF=False, load_keras_weights=True, weights=None, labels=None, batch_size=64, max_iter=2000, test_iter=10, test_interval=500, display=100, lr_policy="step", weight_decay=5e-4, regularization_type="L2"): ^ SyntaxError: import * only allowed at module level 2019-03-12 20:25:48 INFO ShutdownHookManager:54 - Shutdown hook called 2019-03-12 20:25:48 INFO ShutdownHookManager:54 - Deleting directory C:\Users\xyz\AppData\Local\Temp\spark-2e1736f8-1798-42da-a157-cdf0ade1bf36

From my understanding I get that that there is an issue at the library I am using where they use

from .estimators import *

__all__ = estimators.__all__

I am not sure why the wrapper is not working or what fix is required. Any help is appreciated.

score 0 · Answer 1 · answered Mar 12 '19 at 13:23

0

I think the systemml release 1.2.0 misses some fixes for python 3.5 (https://github.com/apache/systemml/commit/9e7ee19a45102f7cbb37507da25b1ba0641868fd) so you will need to install systemml from source (for my setup, which is different than yours, it would git clone and then "cd src/main/python; sudo python3.4 setup.py install")

answered Mar 12 '19 at 13:23

vladmihaisima

2,119
16
20

But the latest release is 1.2.0 itself. – Ricky Mar 13 '19 at 03:51
As far as I can tell they did not make a release including the fix that you look for. You have the option to wait for a new release or install from the source master branch. – vladmihaisima Mar 13 '19 at 10:43
How do you instaled from the source master branch – Ricky Mar 13 '19 at 11:11
I am using `pip install git+https://github.com/apache/systemml` But how to install the master branch? – Ricky Mar 13 '19 at 11:22
Then you can use `pip install 'git+https://github.com/apache/systemml/@master#egg=systemml&subdirectory=src/main/python'` – vladmihaisima Mar 13 '19 at 14:50
I am getting this error. https://stackoverflow.com/questions/55155702/unable-to-install-git-repo-by-pip – Ricky Mar 14 '19 at 06:53
I have a linux environment myself so can't help you reproduce that error, but one comment: did you quote the URL argument as shown in my command? The dash (#) and ampersand (&) are sometimes interpreted by the shell/command rather than being considered part of the argument string. Good luck! – vladmihaisima Mar 14 '19 at 09:14

Wrapper for Keras Model in Spark

1 Answers1