3

How do I create a custom model in pyspark?

In scikit-learn it is easy (see Rolling your own estimator).

However, in pyspark I cannot find any similar documentation.

I found out, from reading the source code, that there are three relevant base interfaces: Model, Estimator and Transformer

However, it is not clear to me if I should inherit from Model or Estimator or both. Especially the inheritance from Parameters is complicated.

I've seen a similar answer here but it is almost one year old and I guess things have changed since then. It also seems to refer to mllib and not ml version of spark.

Hanan Shteingart
  • 8,480
  • 10
  • 53
  • 66
  • mllib is spark https://spark.apache.org/mllib/ – David Mar 05 '18 at 21:42
  • @David , there are two ml packages: mllib and ml for spark: https://stackoverflow.com/questions/38835829/whats-the-difference-between-spark-ml-and-mllib-packages – Hanan Shteingart Mar 06 '18 at 10:11
  • Ah, I misread your question. Depending on your use case, it seems like you could potential use a transformer or an estimator. A "model" is chain (pipeline) of transformations and potentially an estimation. See for some more details https://spark.apache.org/docs/2.2.0/ml-pipeline.html – David Mar 06 '18 at 15:19
  • your link does not explain how to create your own estimator. – Hanan Shteingart Mar 07 '18 at 14:49

0 Answers0