1

I'm trying to model some data with a logistic regression, part of spark MLlib. For the model creation I've got the following columns:

ID,
features,
label

I can split it into Train and value data via

(trainsample,testsample) =  sample.randomSplit([0.7, 0.3], seed)

Also, I can define my model:

lr = LogisticRegression(featuresCol="features", labelCol="label", 
predictionCol="prediction")

Then I can train and test it with:

lrmodel = lr.fit(trainsample)
result = lrmodel.transform(testmodel)

All fine. But now I want to use my model and predict unlabeled data. I am always getting the following Error:

IllegalArgumentException: 'Field "label" does not exist 

I tried to create a dummy label column (all values 999). But than, all my predictions belong to one class (class 6 for 7 different classes). So the label seems to influence my predictions, even with a pretrained model.

Maybe "lrmodel.transform" is just for testing and there is other syntax for use the model. But I didn't find anything to this topic. Any help would be appreciated.

fwnugg
  • 11
  • 2
  • It sounds weird, label should not affect to you model prediction. It is used for your model evaluation. Transform performs a "forward" so you should obtain one prediction regardless you are on cross validation or test – Emiliano Martinez Dec 11 '18 at 09:19
  • Yeah, that's what I thought. It's also weird that Im getting the IllegalArgumentException: 'Field "label" does not exist when I run: result = lrmodel.transform(unlabeled_data) ;/ – fwnugg Dec 11 '18 at 11:44
  • Could you provide a [mcve] please (https://stackoverflow.com/q/48427185/6910411)? – zero323 Dec 11 '18 at 12:03

1 Answers1

0

found the issue... I had the label within my featureset x_x... Thanks for your help

fwnugg
  • 11
  • 2