-1

NOTE:

I am relatively new to Data Science / coding, so please let me know if I'm missing something fundamental.

Question

What is the suggested procedure for implementing a machine learning model in Microsoft excel?

My strategy for doing this has been the following:

1.) Vectorize my text data for training

2.) Train a logistic regression model

3.) Pickle trained model and fit vectorizer

4.) Load model/vectorizer into separate python file

5.) Convert python file into .exe file using PyInstaller

6.) Call python.exe file in Excel VBA, classifying text data as required

This procedure works, and I can call the model to classify text. However, a big problem is how long it takes to execute the .exe file. The python code that loads the pickled data only takes 10-20 seconds to execute. On the other hand, the .exe derived from this code takes at least 5 minute to execute given just a single string entry to classify.

The .exe file is also very large at 305 MB. This is probably due to installation of the sklearn modules needed to load the model and vectorizer. I am not sure if there is a lighter weight solution for this. I am certain there is a better alternative for a simple logistic regression model.

Ultimately, I am looking for any suggestions on how I may run my machine learning models in excel efficiently.

Sample of Python Code

Generate Logistic Regression Model and Pickle

 from sklearn.externals import joblib 
''' https://www.geeksforgeeks.org/saving-a-machine-learning-model/ '''

path = "N:\\Drive_Location\\Pickles"
model_string = "eSOMS_Logistic_Regression_Model.pkl"
vectorizer_transformer_string = "eSOMS_Logistic_Regression_vectorizer_transformer.pkl"
model_full_path = path +'\\' + model_string
vectorizer_full_path = path +'\\' + vectorizer_transformer_string

''' Save the model as a pickle in a file '''
joblib.dump(classifier, model_full_path) 
joblib.dump(vectorizer.transform, vectorizer_full_path) 
logistic_regression_from_joblib = joblib.load(model_full_path)  
vectorizer_transformer_from_joblib = joblib.load(vectorizer_full_path)

Load the Pickled File into Separate Python File

from sklearn.externals import joblib 

path = "N:\\Drive_Location\\Pickles"
model_string = "eSOMS_Logistic_Regression_Model.pkl"
vectorizer_transformer_string = "eSOMS_Logistic_Regression_vectorizer_transformer.pkl"
model_full_path = path +'\\' + model_string
vectorizer_full_path = path +'\\' + vectorizer_transformer_string

logistic_regression_from_joblib = joblib.load(model_full_path)  
vectorizer_transformer_from_joblib = joblib.load(vectorizer_full_path)

import sys

argument = sys.argv[1]

def myfunction(string):
    print (logistic_regression_from_joblib.predict(vectorizer_transformer_from_joblib([string])))

myfunction(argument)

Create .Exe File in Command Prompt using PyInstaller

In command prompt, I created an executable file from the .py file.

pyinstaller --onefile eSOMS_MS_Executable.spec

NOTE: I adjusted the .spec file fed to PyInstaller to up the recursion limit and to specify numerous hidden imports:

# -*- mode: python -*-

import sys
sys.setrecursionlimit(5000)

block_cipher = None


a = Analysis(['eSOMS_MS_Executable.py'],
             pathex=['C:\\Users\\Username\\Desktop\\Python\\eSOMS'],
             binaries=[],
             datas=[],
             hiddenimports = ['sklearn.linear_model', 'sklearn.utils.sparsetools._graph_validation', 'sklearn.utils.sparsetools._graph_tools', 'sklearn.neighbors.typedefs', 'sklearn.feature_extraction'],
             hookspath=[],
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher,
             noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
             cipher=block_cipher)
exe = EXE(pyz,
          a.scripts,
          a.binaries,
          a.zipfiles,
          a.datas,
          [],
          name='eSOMS_MS_Executable',
          debug=False,
          bootloader_ignore_signals=False,
          strip=False,
          upx=True,
          runtime_tmpdir=None,
          console=True )

Code works, but it's just too slow to be practical.

  • Out of curiosity, why would you need to use excel? – Bartek Malysz May 30 '19 at 21:29
  • I will be building additional functionality within the excel sheet that will integrate with the python code executable. I want the user to be able to be able to enter in different time ranges, and all my users are comfortable using Excel. – Matt Pennington Jun 02 '19 at 23:07
  • Thank you, I understand, a lot of folks are resistant to change and they like to go by their ways, but please consider this being an opportunity for innovation. You can easily build a front end using Flask, Dash, Django where users could interact with your model using an interface similar to mobile banking every one uses or some online account management system. I have the same experience from my organisation, but if you can pull it off the right way your setting yourself up for a promotion. Organisations like innovative thinkers. Good luck. – Bartek Malysz Jun 03 '19 at 18:39
  • That's a fair comment. There would be some challenges going this route, since we have a pretty restrictive IT. However, it may be the best long term solution. – Matt Pennington Jun 03 '19 at 19:44

1 Answers1

0

After discussing with IT friends on mine, they directed me to this post: App created with PyInstaller has a slow startup

I learned that executable created from --onefile can be very large / slow. I changed to --onedir and the executable became much smaller, and now is only 15 MB (big improvement vs. 305 MB).

pyinstaller --onedir eSOMS_MS_Executable.spec

Thanks guys!