Unfortunately it's not possible to just set the coefficients of a pyspark LR model. The pyspark LR model is actually a wrapper around a java ml model (see class JavaEstimator
).
So when the LR model is fit, it transfers the params from the paramMap
to a new java estimator, which is fit to the data. All the LogisticRegressionModel
methods/attributes are just calls to the java model using the _call_java
method.
Since the coefficients aren't params (you can see a comprehensive list using explainParams
on a LR instance), you can't pass them to the java LR model that's created, and there is not a setter method.
For example, for a logistic regression model lrm
, you can see that the only setters are for the params you can set when you instantiate a pyspark LR instance: lowerBoundsOnCoefficients
and upperBoundsOnCoefficients
.
print([c for c in lmr._java_obj.__dir__() if "coefficient" in c.lower()])
# >>> ['coefficientMatrix', 'lowerBoundsOnCoefficients',
# 'org$apache$spark$ml$classification$LogisticRegressionParams$_setter_$lowerBoundsOnCoefficients_$eq',
# 'getLowerBoundsOnCoefficients',
# 'org$apache$spark$ml$classification$LogisticRegressionParams$_setter_$upperBoundsOnCoefficients_$eq',
# 'getUpperBoundsOnCoefficients', 'upperBoundsOnCoefficients', 'coefficients',
# 'org$apache$spark$ml$classification$LogisticRegressionModel$$_coefficients']
Trying to set the "coefficients" attribute yields this:
print(lmr.coefficients)
# >>> DenseVector([18.9303, -18.9303])
lmr.coefficients = [10, -10]
# >>> AttributeError: can't set attribute
So you'd have to roll your own pyspark transformer if you want to be able to provide coefficients. It would probably be easier just to calculate results using the standard logistic function as per @pault's comment.