Questions tagged [distributed-computing]

Utilizing more than one computer, connected to each other with a communication link to accomplish a common task.

Distributed computing is a field of study which describes how multiple connected computing units can achieve a common task. The larger computing power enables more tasks to be performed than in a single unit, and searches can be coordinated for efficiency. Successes usually give the finder credit.

Distributed computing projects include hunting large prime numbers and analysing DNA codes.

Projects

References

Distributed and Parallel Computing @ Berkeley

2821 questions

413

votes

8 answers

Explaining Apache ZooKeeper

I am trying to understand ZooKeeper, how it works and what it does. Is there any application which is comparable to ZooKeeper? If you know, then how would you describe ZooKeeper to a layman? I have tried apache wiki, zookeeper sourceforge...but I…

apache-zookeeper distributed-computing

asked Sep 07 '10 at 21:43

topgun_ivard

8,376
10
38
45

410

votes

20 answers

Spark - repartition() vs coalesce()

According to Learning Spark Keep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of repartition() called coalesce() that allows avoiding data movement, but only if you are decreasing the…

apache-spark distributed-computing rdd

asked Jul 24 '15 at 12:49

Praveen Sripati

32,799
16
80
117

296

votes

2 answers

What are workers, executors, cores in Spark Standalone cluster?

I read Cluster Mode Overview and I still can't understand the different processes in the Spark Standalone cluster and the parallelism. Is the worker a JVM process or not? I ran the bin\start-slave.sh and found that it spawned the worker, which is…

apache-spark distributed-computing

asked Sep 17 '15 at 03:06

Manikandan Kannan

8,684
15
44
65

239

votes

6 answers

What is the difference between cache and persist?

In terms of RDD persistence, what are the differences between cache() and persist() in spark ?

apache-spark distributed-computing rdd

asked Nov 11 '14 at 17:14

user1261215

131

votes

25 answers

Calculate the median of a billion numbers

If you have one billion numbers and one hundred computers, what is the best way to locate the median of these numbers? One solution which I have is: Split the set equally among the computers. Sort them. Find the medians for each set. Sort the sets…

algorithm distributed-computing

asked Apr 03 '10 at 13:32

anony

1,473
3
13
10

votes

4 answers

Meaning of inter_op_parallelism_threads and intra_op_parallelism_threads

Can somebody please explain the following TensorFlow terms inter_op_parallelism_threads intra_op_parallelism_threads or, please, provide links to the right source of explanation. I have conducted a few tests by changing the parameters, but the…

python parallel-processing tensorflow distributed-computing

asked Dec 20 '16 at 01:33

itsamineral

1,369
3
14
19

votes

3 answers

2PC vs Sagas (distributed transactions)

I'm developing my insight about distributed systems, and how to maintain data consistency across such systems, where business transactions covers multiple services, bounded contexts and network boundaries. Here are two approaches which I know are…

transactions cloud microservices distributed-computing saga

asked Feb 21 '18 at 13:10

Tuomas Toivonen

21,690
47
129
225

votes

3 answers

Apache Spark vs Akka

Could you please tell me the difference between Apache Spark and AKKA, I know that both frameworks meant to programme distributed and parallel computations, yet i don't see the link or the difference between them. Moreover, I would like to get the…

apache-spark parallel-processing akka distributed-computing

asked Mar 16 '15 at 23:29

user4658980

votes

4 answers

Why isn't RDBMS Partition Tolerant in CAP Theorem and why is it Available?

Two points I don’t understand about RDBMS being CA in CAP Theorem : 1) It says RDBMS is not Partition Tolerant but how is RDBMS any less Partition Tolerant than other technologies like MongoDB or Cassandra? Is there a RDBMS setup where we give up CA…

distributed-computing rdbms distributed-system cap-theorem nosql

asked Apr 04 '16 at 13:58

Glide

20,235
26
86
135

votes

5 answers

Difference between cloud computing and distributed computing?

I wanted to know about the difference about cloud computing and distributed computing. I read an article about cloud computing and got a feeling that somewhere there is a relation between cloud computing and distributed computing and so wanted to…

cloud distributed-computing

asked Aug 28 '09 at 23:36

Rachel

100,387
116
269
365

votes

4 answers

Service discovery vs load balancing

I am trying to understand in which scenario I should pick a service registry over a load balancer. From my understanding both solutions are covering the same functionality. For instance if we consider consul.io as a feature list we have: Service…

web-services amazon-web-services cloud distributed-computing microservices

asked Oct 14 '15 at 12:36

Lucian Enache

2,510
5
34
59

votes

1 answer

"Eventual Consistency" vs "Strong Eventual Consistency" vs "Strong Consistency"?

I came across the concept of "Strong Eventual Consistency" . Is it supposed to be stronger than "Eventual Consistency" but weaker than "Strong Consistency"? Could someone explain the differences among these three concepts with applicable…

distributed-computing

asked Apr 01 '15 at 01:37

njzhxf

votes

1 answer

What is spark.driver.maxResultSize?

The ref says: Limit of total size of serialized results of all partitions for each Spark action (e.g. collect). Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may…

apache-spark configuration driver communication distributed-computing

asked Aug 22 '16 at 20:06

gsamaras

71,951
46
188
305

votes

1 answer

Flattening Rows in Spark

I am doing some testing for spark using scala. We usually read json files which needs to be manipulated like the following example: test.json: {"a":1,"b":[2,3]} val test = sqlContext.read.json("test.json") How can I convert it to the following…

scala apache-spark apache-spark-sql distributed-computing

asked Oct 02 '15 at 11:53

Nir Ben Yaacov

1,182
2
17
33

votes

2 answers

What is a task in Spark? How does the Spark worker execute the jar file?

After reading some document on http://spark.apache.org/docs/0.8.0/cluster-overview.html, I got some question that I want to clarify. Take this example from Spark: JavaSparkContext spark = new JavaSparkContext( new…

apache-spark distributed-computing

asked Aug 13 '14 at 00:47

EdwinGuo

1,765
2
21
27

2 3

…

99 100 Next