NOTE:
I am relatively new to Data Science / coding, so please let me know if I'm missing something fundamental.
Question
What is the suggested procedure for implementing a machine learning model in Microsoft excel?
My strategy for doing this has been the following:
1.) Vectorize my text data for training
2.) Train a logistic regression model
3.) Pickle trained model and fit vectorizer
4.) Load model/vectorizer into separate python file
5.) Convert python file into .exe file using PyInstaller
6.) Call python.exe file in Excel VBA, classifying text data as required
This procedure works, and I can call the model to classify text. However, a big problem is how long it takes to execute the .exe file. The python code that loads the pickled data only takes 10-20 seconds to execute. On the other hand, the .exe derived from this code takes at least 5 minute to execute given just a single string entry to classify.
The .exe file is also very large at 305 MB. This is probably due to installation of the sklearn modules needed to load the model and vectorizer. I am not sure if there is a lighter weight solution for this. I am certain there is a better alternative for a simple logistic regression model.
Ultimately, I am looking for any suggestions on how I may run my machine learning models in excel efficiently.
Sample of Python Code
Generate Logistic Regression Model and Pickle
from sklearn.externals import joblib
''' https://www.geeksforgeeks.org/saving-a-machine-learning-model/ '''
path = "N:\\Drive_Location\\Pickles"
model_string = "eSOMS_Logistic_Regression_Model.pkl"
vectorizer_transformer_string = "eSOMS_Logistic_Regression_vectorizer_transformer.pkl"
model_full_path = path +'\\' + model_string
vectorizer_full_path = path +'\\' + vectorizer_transformer_string
''' Save the model as a pickle in a file '''
joblib.dump(classifier, model_full_path)
joblib.dump(vectorizer.transform, vectorizer_full_path)
logistic_regression_from_joblib = joblib.load(model_full_path)
vectorizer_transformer_from_joblib = joblib.load(vectorizer_full_path)
Load the Pickled File into Separate Python File
from sklearn.externals import joblib
path = "N:\\Drive_Location\\Pickles"
model_string = "eSOMS_Logistic_Regression_Model.pkl"
vectorizer_transformer_string = "eSOMS_Logistic_Regression_vectorizer_transformer.pkl"
model_full_path = path +'\\' + model_string
vectorizer_full_path = path +'\\' + vectorizer_transformer_string
logistic_regression_from_joblib = joblib.load(model_full_path)
vectorizer_transformer_from_joblib = joblib.load(vectorizer_full_path)
import sys
argument = sys.argv[1]
def myfunction(string):
print (logistic_regression_from_joblib.predict(vectorizer_transformer_from_joblib([string])))
myfunction(argument)
Create .Exe File in Command Prompt using PyInstaller
In command prompt, I created an executable file from the .py file.
pyinstaller --onefile eSOMS_MS_Executable.spec
NOTE: I adjusted the .spec file fed to PyInstaller to up the recursion limit and to specify numerous hidden imports:
# -*- mode: python -*-
import sys
sys.setrecursionlimit(5000)
block_cipher = None
a = Analysis(['eSOMS_MS_Executable.py'],
pathex=['C:\\Users\\Username\\Desktop\\Python\\eSOMS'],
binaries=[],
datas=[],
hiddenimports = ['sklearn.linear_model', 'sklearn.utils.sparsetools._graph_validation', 'sklearn.utils.sparsetools._graph_tools', 'sklearn.neighbors.typedefs', 'sklearn.feature_extraction'],
hookspath=[],
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
cipher=block_cipher)
exe = EXE(pyz,
a.scripts,
a.binaries,
a.zipfiles,
a.datas,
[],
name='eSOMS_MS_Executable',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
runtime_tmpdir=None,
console=True )
Code works, but it's just too slow to be practical.