I would like to use a pretrained xgboost classifier in pyspark but the nodes on the cluster don't have the xgboost module installed. I can pickle the classifier I have trained and broadcast it but this isn't enough as I still need the module to be loaded at each cluster node.
I can't install it on the cluster nodes as I don't have root and there is no shared file system.
How can I distribute the xgboost classifier for use in spark?
I have an egg for xgboost. Could something like http://apache-spark-user-list.1001560.n3.nabble.com/Loading-Python-libraries-into-Spark-td7059.html or https://stackoverflow.com/a/24686708/2179021 work?