I'm trying to model some data with a logistic regression, part of spark MLlib. For the model creation I've got the following columns:
ID,
features,
label
I can split it into Train and value data via
(trainsample,testsample) = sample.randomSplit([0.7, 0.3], seed)
Also, I can define my model:
lr = LogisticRegression(featuresCol="features", labelCol="label",
predictionCol="prediction")
Then I can train and test it with:
lrmodel = lr.fit(trainsample)
result = lrmodel.transform(testmodel)
All fine. But now I want to use my model and predict unlabeled data. I am always getting the following Error:
IllegalArgumentException: 'Field "label" does not exist
I tried to create a dummy label column (all values 999). But than, all my predictions belong to one class (class 6 for 7 different classes). So the label seems to influence my predictions, even with a pretrained model.
Maybe "lrmodel.transform" is just for testing and there is other syntax for use the model. But I didn't find anything to this topic. Any help would be appreciated.