3

I am using IBM Watson Studio (Default spark python environment) and trying to convert a Keras model to systemml DML and train it on Spark.

!pip install systemml 
import systemml

this executes just fine. But this -

from systemml import mllearn 

throws SyntaxError: import * only allowed at module level

dir(systemml)

doesn't show mllearn.

I tried to install it from http://www.romeokienzler.com/systemml-1.0.0-SNAPSHOT-python.tar.gz and https://sparktc.ibmcloud.com/repo/latest/systemml-1.0.0-SNAPSHOT-python.tar.gz and a git clone but was unsuccessful. What am I doing wrong?

3 Answers3

2

You need to do dir(systemml.mllearn) to see mllearn functions.

>>> dir(systemml.mllearn)
['Caffe2DML', 'Keras2DML', 'LinearRegression', 'LogisticRegression', 
'NaiveBayes', 'SVM', '__all__', '__builtins__', '__doc__', '__file__', 
'__name__', '__package__', '__path__', 'estimators']

Please install SystemML 1.2 from pypi.org. 1.2 is the latest release from Aug. 2018. Release 1.0 only had experimental support.

Can you please try to only import MLContext, just to see whether loading the main SystemML jar file works, and what version your installation uses?

>>> from systemml import MLContext
>>> ml = MLContext(sc)

Welcome to Apache SystemML!
Version 1.2.0

>>> print (ml.buildTime())
2018-08-17 05:58:31 UTC

>>> from sklearn import datasets, neighbors
>>> from systemml.mllearn import LogisticRegression

>>> y_digits = digits.target 
>>> n_samples = len(X_digits) 
>>> X_train = X_digits[:int(.9 * n_samples)] 
>>> y_train = y_digits[:int(.9 * n_samples)] 
>>> X_test = X_digits[int(.9 * n_samples):] 
>>> y_test = y_digits[int(.9 * n_samples):] 
>>> 
>>> logistic = LogisticRegression(spark)
>>> 
>>> print('LogisticRegression score: %f' % logistic.fit(X_train, y_train).score(X_test, y_test))
18/10/20 00:15:52 WARN BaseSystemMLEstimatorOrModel: SystemML local memory     budget:5097 mb. Approximate free memory available on the driver JVM:416 mb.
18/10/20 00:15:52 WARN StatementBlock: WARNING: [line 81:0] -> maxinneriter --     Variable maxinneriter defined with different value type in if and else clause.
18/10/20 00:15:53 WARN SparkExecutionContext: Configuration parameter     spark.driver.maxResultSize set to 1 GB. You can set it through Spark default configuration setting either to 0 (unlimited) or to available memory budget of size 4 GB.
BEGIN MULTINOMIAL LOGISTIC REGRESSION SCRIPT
...
  • 1
    !pip install systemml - Downloading https://files.pythonhosted.org/packages/b1/94/62104cb8c526b462cd501c7319926fb81ac9a5668574a0b3407658a506ab/systemml-1.2.0.tar.gz (9.7MB) ……………….….. Successfully installed Pillow-5.3.0 numpy-1.15.2 pandas-0.23.4 python-dateutil-2.7.3 pytz-2018.5 scikit-learn-0.20.0 scipy-1.1.0 six-1.11.0 systemml-1.2.0...……..but : from systemml import MLContext ml = MLContext(sc) print(ml.buildTime()) gives "2017-04-19 21:45:10 UTC" – Vinitha Palani Oct 20 '18 at 10:10
  • Did you restart your kernel after pip install? – Berthold Reinwald Oct 24 '18 at 06:29
  • If yes, then your pip command installs SystemML 1.2 in your "private" area, while your notebook environment picks up an old SystemML from a different path. – Berthold Reinwald Oct 24 '18 at 06:47
  • You need to find out what your class path is, and add your pip installed SystemML jars to it. "pip show systemml" will show you the install path. – Berthold Reinwald Oct 24 '18 at 06:50
  • I always created a symbolic link to add my JARs to the class path. "~/data/libs/" was by default in the class path, and user jars are picked up from there. If that is the case and assuming that your jars are installed under ~/.local/lib/..., creating below links should solve your problem. And restarting your kernel. !ln -s -f ~/.local/lib/python2.7/site-packages/systemml/systemml-java/systemml-1.2.0-extra.jar ~/data/libs/systemml-1.2.0-extra.jar !ln -s -f ~/.local/lib/python2.7/site-packages/systemml/systemml-java/systemml-1.2.0.jar ~/data/libs/systemml-1.2.0.jar – Berthold Reinwald Oct 24 '18 at 06:50
1

The code works with the Python 2.7 kernel, but not with the Python 3.5 kernel. The commit https://github.com/apache/systemml/commit/9e7ee19a45102f7cbb37507da25b1ba0641868fd fixes the issue for Python 3.5. If you want to fix the older released version in your local environment, please follow two steps:

A. Fix for the indentation requirement of Python 3.5:

pip install autopep8
find /<location>/systemml/ -name '*.py' | xargs autopep8 --in-place --aggressive
find /<location>/systemml/mllearn/ -name '*.py' | xargs autopep8 --in-place --aggressive

You can find the <location> using pip show systemml

B. Fix for the stricter Python 3.5 syntax: Replace the line in mllearn/estimator.py

from .keras2caffe import *

with

import keras
from .keras2caffe import convertKerasToCaffeNetwork, convertKerasToCaffeSolver​​​​​​​, convertKerasToSystemMLModel 

Since the fix is already delivered, you will have to wait for the next release i.e. 1.3.0. Alternatively, you can build and install the latest version:

git clone https://github.com/apache/systemml.git
cd systemml
mvn package -P distribution
pip install target/systemml-1.3.0-SNAPSHOT-python.tar.gz

Thanks,

Niketan.

Niketan
  • 156
  • 2
  • 8
1

final this is perfectly working if you are working on IBM cloud notebook

1)

! pip install --upgrade https://github.com/niketanpansare/future_of_data/raw/master/systemml-1.3.0-SNAPSHOT-python.tar.gz

2)

!ln -s -f /home/spark/shared/user-libs/python3/systemml/systemml-java/systemml-1.3.0-SNAPSHOT-extra.jar ~/user-libs/spark2/systemml-1.3.0-SNAPSHOT-extra.jar


!ln -s -f /home/spark/shared/user-libs/python3/systemml/systemml-java/systemml-1.3.0-SNAPSHOT.jar ~/user-libs/spark2/systemml-1.3.0-SNAPSHOT.jar

~
~

tanvir
  • 11
  • 1