How does Spark perform I/O?

Question

It is my understanding that Spark uses parallel IO to read files. That conclusion comes from other stack overflow responses.

My question is does spark read data using an independent approach or a collective approach? In other words, does each worker read a set chunk of data, or do the workers communicate with each other and collaborate to efficiently read data?

score 1 · Answer 1 · edited Oct 30 '18 at 20:07

1

Each Apache Spark workers has Executors, Workers can be deployed as distributed or standalone mode.
Each Worker process its own data that it processes. For more detail see this answer or this link

edited Oct 30 '18 at 20:07

thebluephantom

16,458
8
40
83

answered Oct 30 '18 at 18:57

Yugerten

878
1
11
30

To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. I am wondering if you are not confusing things? – thebluephantom Oct 30 '18 at 19:23
@thebluephantom it doesn't talk about installation – Yugerten Oct 30 '18 at 20:15

score 1 · Answer 2 · answered Nov 07 '18 at 23:26

1

The workers communicate by the driver And each worker process its own data

answered Nov 07 '18 at 23:26

A Khe

73
7

How does Spark perform I/O?

2 Answers2