Task data locality NO_PREF. When is it used?

Question

According to Spark doc, there are 5 levels of data locality:

PROCESS_LOCAL
NODE_LOCAL
NO_PREF
RACK_LOCAL
ANY

All of them are pretty clear to me apart NO_PREF (from Spark doc: "data is accessed equally quickly from anywhere and has no locality preference")

What is the case NO_PREF whould be used?

score 1 · Accepted Answer · answered Apr 15 '16 at 11:30

One of the RDD characteristics is preferred locations. For example if RDD source is an HDFS file, preferred location should contain data nodes where data is physically located. But if there is no difference where data is coming from or Spark is unable to determine preferred locations, Spark creates tasks with data locality set to NO_PREF during processing such RDDs.

Task data locality NO_PREF. When is it used?

1 Answers1

Linked