I want to train different models for each user in my dataset. Is there built in support for that in Spark MlLib/Pipelines?
If not, what's the easiest/cleanest way to train multiple and separate models for each user?
I want to train different models for each user in my dataset. Is there built in support for that in Spark MlLib/Pipelines?
If not, what's the easiest/cleanest way to train multiple and separate models for each user?
Unfortunately Spark-ML
doesn't provide the ability to separate concept "single model - single user". But you can make a custom logic as you wish. I see two possible variants of solving this task.
The first scenario for solving this situation is following to the next algorithm (I took everything for example - you will have different steps, but algorithm will logically similar):
Dataset
which depends on the user related data - let's consider the next situation your dataset has two columns - the specific criteria X
and user's productivity Y
and latest parameter is changeable for user group - you must train your model for instance with LinearRegression so predict if user can do work in the time or can't.The second approach is to train your model so it was applicable to every user, you must choose options for algorithm so it didn't depend on group of user, in other words, generalize algorithm of training model to all user groups - in this case, you don't have a sense of separation
"single-model--> single user". If the second variant is more complicated to the implementation on your dataset, follow the first approach.