2

According to Spark doc, there are 5 levels of data locality:

  • PROCESS_LOCAL
  • NODE_LOCAL
  • NO_PREF
  • RACK_LOCAL
  • ANY

All of them are pretty clear to me apart NO_PREF (from Spark doc: "data is accessed equally quickly from anywhere and has no locality preference")

What is the case NO_PREF whould be used?

loba76
  • 75
  • 1
  • 8

1 Answers1

1

One of the RDD characteristics is preferred locations. For example if RDD source is an HDFS file, preferred location should contain data nodes where data is physically located. But if there is no difference where data is coming from or Spark is unable to determine preferred locations, Spark creates tasks with data locality set to NO_PREF during processing such RDDs.

Dmitry Y.
  • 185
  • 8