I am deploying my program in spark cluster and I need to give each node a specific list of data that I decide on. How can I do this? I created an RDD object out of my data but I don't know how to pass the specific part of data to each node.
Asked
Active
Viewed 172 times
-1
-
1Very vague. What do you want to pass specifically? – thebluephantom Oct 15 '20 at 10:42
-
@thebluephantom I need to pass a block of data to each machine but I need to specify which block (and its size and content) goes to which machine – Ley Big Oct 18 '20 at 15:44
-
not sure that is possible unless out of control of spark and you do not know which executors on which node. – thebluephantom Oct 18 '20 at 16:01
2 Answers
0
I don't think, you can pass a specific list to the node. If your data have unique keys, then you can use hash technique to send same keys on specific partition

david gupta
- 56
- 4
-
Yes, they have unique keys. But how can I use the hash technique you are talking about? – Ley Big Oct 15 '20 at 07:56
-
https://stackoverflow.com/questions/31424396/how-does-hashpartitioner-work – david gupta Oct 15 '20 at 08:30
0
Not possible as you have no control which Worker Nodes are allocated, and, N Executors may be on same Worker Node.

thebluephantom
- 16,458
- 8
- 40
- 83