4

Could you help me with instructions on how to set the checkpoint dir for a PySpark session on IBM's Data Science Experience?.

The need came because i have to run connectedComponents() from GraphFrames and it raises the following error

Py4JJavaError: An error occurred while calling o221.run.
: java.io.IOException: Checkpoint directory is not set. Please set it first using sc.setCheckpointDir(). 
Marco
  • 8,958
  • 1
  • 36
  • 56
ElBrocas
  • 399
  • 4
  • 13

1 Answers1

10

The main issue is to get the directory that the notebook has as working directory to set the checkpoit dir with sc.setCheckpointDir(). this can be done easily with

!pwd

Then, a directory for checkpoints should be created on that route

!mkdir <pwd_output>/checkpoints

Finally set the checkpoint

spark.sparkContext.setCheckpointDir('<pwd_output>/checkpoints')
ElBrocas
  • 399
  • 4
  • 13