Questions tagged [embarrassingly-parallel]

An embarrassingly parallel problem is one for which little or no effort is required to separate the problem into a number of parallel tasks. This is often the case where there exists no dependency (or communication) between those parallel tasks.

An embarrassingly parallel problem is one for which little or no effort is required to separate the problem into a number of parallel tasks. This is often the case where there exists no dependency (or communication) between those parallel tasks.

These problems tend to require little or no communication of results between tasks, and are thus different from distributed computing problems that require communication between tasks, especially communication of intermediate results. They are easy to perform on server farms which do not have any of the special infrastructure used in a true supercomputer cluster.

29 questions
87
votes
5 answers

Solving embarassingly parallel problems using Python multiprocessing

How does one use multiprocessing to tackle embarrassingly parallel problems? Embarassingly parallel problems typically consist of three basic parts: Read input data (from a file, database, tcp connection, etc.). Run calculations on the input data,…
gotgenes
  • 38,661
  • 28
  • 100
  • 128
10
votes
4 answers

"embarrassingly parallel" programming using python and PBS on a cluster

I have a function (neural network model) which produces figures. I wish to test several parameters, methods and different inputs (meaning hundreds of runs of the function) from python using PBS on a standard cluster with Torque. Note: I tried…
meduz
  • 3,903
  • 1
  • 28
  • 40
9
votes
4 answers

Parallelize pandas apply

New to pandas, I already want to parallelize a row-wise apply operation. So far I found Parallelize apply after pandas groupby However, that only seems to work for grouped data frames. My use case is different: I have a list of holidays and for my…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
9
votes
4 answers

JVM (embarrasingly) parallel processing libraries/tools

I am looking for something that will make it easy to run (correctly coded) embarrassingly parallel JVM code on a cluster (so that I can use Clojure + Incanter). I have used Parallel Python in the past to do this. We have a new PBS cluster and our…
8
votes
4 answers

Fastest way to run a single function in python in parallel for multiple parameters

Suppose I have a single function processing. I want to run the same function multiple times for multiple parameters parallelly instead of sequentially one after the other. def processing(image_location): image =…
7
votes
3 answers

What is the best way to avoid overloading a parallel file-system when running embarrassingly parallel jobs?

We have a problem which is embarrassingly parallel - we run a large number of instances of a single program with a different data set for each; we do this simply by submitting the application many times to the batch queue with different parameters…
7
votes
1 answer

multiprocessing - reading big input data - program hangs

I want to run parallel computation on some input data which is loaded from a file. (The file can be really big, so I use a generator for this.) On a certain number of items, my code runs OK but above this threshold the program hangs (some of the…
galapah
  • 379
  • 1
  • 2
  • 14
6
votes
0 answers

Is there a Starcluster equivalent for Google Compute Engine (GCE) yet?

Does anyone know if there is a Starcluster equivalent for GCE? I have been quite happy using Starcluster with EC2 for embarrisingly parallel jobs. Now I want to try out GCE. I would be happy to contribute to whatever projects might be in the…
5
votes
2 answers

Expected speedup from embarrassingly parallel task using Python Multiprocessing

I'm learning to use Python's Multiprocessing package for embarrassingly parallel problems, so I wrote serial and parallel versions for determining the number of primes less than or equal to a natural number n. Based on what I read from a blog post…
5
votes
3 answers

Tools for setting up and running a grid job on Google Compute Engine?

I have the need to set up and run "embarrassingly" parallel jobs on Google Compute Engine. I am looking for tools to facilitate this. On EC2, I was using MIT's Starcluster to set up the cluster, and then just submitting the job to SGE. Are there…
5
votes
1 answer

Memory-intense jobs scaling poorly on multi-core cloud instances (ec2, gce, rackspace)?

Has anyone else noticed terrible performance when scaling up to use all the cores on a cloud instance with somewhat memory intense jobs (2.5GB in my case)? When I run jobs locally on my quad xeon chip, the difference between using 1 core and all 4…
2
votes
3 answers

Embarrassingly parallel workflow creates too many output files

On a Linux cluster I run many (N > 10^6) independent computations. Each computation takes only a few minutes and the output is a handful of lines. When N was small I was able to store each result in a separate file to be parsed later. With large N…
Hooked
  • 84,485
  • 43
  • 192
  • 261
1
vote
0 answers

Python Bias Correction using Xarray's apply_ufunc and embarrassingly parallel computing with Dask

I am quite new to parallel computing and Dask. In addition, this is my first question here in Stackoverflow and I hope everything will work. Problem Description I want to set up a bias correction of climate data (e.g. total precipitation, tp). For…
1
vote
1 answer

What does jug status 'Active' mean, and why does it not equal the number of procs requested?

I've been unable to find what status 'Active' tasks are. I'm using JUG 2.1.1, and I don't see that word appear anywhere in the manual, except in a footnote about 'active-wait'. I'm using an LSF array to run a large number (hundreds of thousands) of…
LGS
  • 110
  • 8
1
vote
1 answer

Matlab dfeval overhead

I have an embarrassingly parallel job that requires no communication between the workers. I'm trying to use the dfeval function, but the overhead seems to be enormous. To get started, I'm trying to run the example from the documentation. >>…
MatlabSorter
  • 1,290
  • 1
  • 11
  • 19
1
2