15

Is it possible to run a mixed-effects regression model in Spark? (as we can do with lme4 in R, with MixedModels in Julia or with Statsmodels MixedLM in Python).
Any example would be great.

I've read there is a GLMix function but I don't know if the user can use it directly to fit a model and get the coefficients and p-values or if it can only be used internally by machine learning libraries.

I would like to move to Spark because my datasets are much bigger than memory.

Is there any other common database or framework able to do something like that streaming data from disk?
I've only seen some able to do simple linear regression.

Regards

skan
  • 7,423
  • 14
  • 59
  • 96

1 Answers1

0

Yes, this is definitely possible with Spark.

The first thing I would look into is a rather popular library called ML Lib. I am not sure if it does exactly the kind of model that you need, but definitely more than 'simple linear regression'.

Another library 'linkedin/photon-ml', which I am not familiar with, does explictly mention mixed effect models.

Here is an example of using the Generalized Additive Mixed Effects driver:

spark-submit \
  --class com.linkedin.photon.ml.cli.game.GameTrainingDriver \
  --master local[*] \
  --num-executors 4 \
  --driver-memory 1G \
  --executor-memory 1G \
  "./build/photon-all_2.10/libs/photon-all_2.10-1.0.0.jar" \
  --input-data-directories "./a1a/train/" \
  --validation-data-directories "./a1a/test/" \
  --root-output-directory "out" \
  --feature-shard-configurations "name=globalShard,feature.bags=features" \
  --coordinate-configurations "name=global,feature.shard=globalShard,min.partitions=4,optimizer=LBFGS,tolerance=1.0E-6,max.iter=50,regularization=L2,reg.weights=0.1|1|10|100" \
  --coordinate-update-sequence "global" \
  --coordinate-descent-iterations 1 \
  --training-task "LOGISTIC_REGRESSION"
Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
  • 3
    ML Lib does not support mixed effects models, only basic GLMs, with L2 penalty (and L1 & elastic net for linear and logistic regression AFAIK) – Melkor.cz Nov 05 '20 at 09:33