How to create RDD from memory of Slaves in Spark?

Question

I know this may sound silly but is there any way to create an RDD from files that are now currently in cluster's slaves' memory? I know to create an RDD we have to specify a path/hdfs path in which files are stored. But I am curious if I can copy objects between Java applications and put an object directly into slaves' memory under the same name, is there any way to create RDD with these files and/or work in a distributed manner? Thanks in advance!

score 1 · Answer 1 · answered Jan 28 '18 at 01:16

The short answer is negative.

"Slaves" don't participate in computation at all. There are only responsible for resource management part.

Workers from the other hand don't exist by itself. There are tied to an application so there is no "current state" outside it.

What you can do is create dummy RDD and load some objects when calling functions on them. This however should never be tied to specific physical hosts. While Spark has some support for hinting for preferred locations, there is no guarantee that particular task will be processed on a particular machine, or that assignment will be constant across different evaluations, even in the same application.

How to create RDD from memory of Slaves in Spark?

1 Answers1