12

I am using DataBricks and Spark 7.4ML,

The following code successfully logs the params and metrics, and I can see the ROCcurve.png in the MLFLOW gui (just the item in the tree below the model). But the actually plot is blank. Why?

with mlflow.start_run(run_name="logistic-regression") as run:
  pipeModel = pipe.fit(trainDF)
  mlflow.spark.log_model(pipeModel, "model")
  predTest = pipeModel.transform(testDF)
  predTrain = pipeModel.transform(trainDF)
  evaluator=BinaryClassificationEvaluator(labelCol="arrivedLate")
  trainROC = evaluator.evaluate(predTrain)
  testROC = evaluator.evaluate(predTest)
  print(f"Train ROC: {trainROC}")
  print(f"Test ROC: {testROC}")
  mlflow.log_param("Dataset Name", "Flights " + datasetName)
  mlflow.log_metric(key="Train ROC", value=trainROC)
  mlflow.log_metric(key="Test ROC", value=testROC)

  lrModel = pipeModel.stages[3]
  trainingSummary = lrModel.summary
  roc = trainingSummary.roc.toPandas()
  plt.plot(roc['FPR'],roc['TPR'])
  plt.ylabel('False Positive Rate')
  plt.xlabel('True Positive Rate')
  plt.title('ROC Curve')
  plt.show()
  plt.savefig("ROCcurve.png")
  mlflow.log_artifact("ROCcurve.png")
  plt.close()
  
  display(predTest.select(stringCols + ["arrivedLate", "prediction"]))

What the notebook shows:

enter image description here

What the MLFlow shows:

enter image description here

mck
  • 40,932
  • 13
  • 35
  • 50
Dr.YSG
  • 7,171
  • 22
  • 81
  • 139

2 Answers2

12
import mlflow 
import matplotlib.pyplot as plt

fig, axs = plt.subplots(2)
x0, y0 = [1,2,3], [1,2,3]
x1, y1 = [1,2,3], [1,2,3]
axs[0].plot(x0, y0)
axs[1].plot(x1, y1)
mlflow.log_figure(fig, 'my_plot.png')
Yapi
  • 164
  • 1
  • 6
  • You can additionally add a folder name if you want to organise your plots a little better. MLflow will create the folder automatically. For example: `mlflow.log_figure(fig, 'plots/my_plot.png')`. – Jakob May 03 '23 at 10:15
8

Put plt.show() after plt.savefig() - plt.show() will remove your plot because it is shown already.

mck
  • 40,932
  • 13
  • 35
  • 50
  • 2
    *sigh* after 52 years of programming, there are always things that make me seem like a complete newbie. – Dr.YSG Dec 04 '20 at 15:24
  • My code seems to give the ROC for the training set. How would I get it for the Test set? – Dr.YSG Dec 04 '20 at 15:26
  • @Dr.YSG can't tell it from the code... if you could just open another question and provide all necessary code that would be great. – mck Dec 04 '20 at 15:29
  • @Dr.YSG in the library you are using - sparkml - the attribute `summary` refers to training data. https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.regression.LinearRegressionModel.html#pyspark.ml.regression.LinearRegressionModel.summary You may need a bit of custom code, e.g. with `scikitlearn` to create ROC curves on test data. – Maciej Skorski Sep 27 '22 at 11:08