Questions tagged [parallelism-amdahl]

Amdahl's law, also known as Amdahl's argument, is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors. The law is named after computer architect Gene Amdahl, and was presented at the AFIPS Spring Joint Computer Conference in 1967.

Amdahl's law, also known as Amdahl's argument, is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in to predict the theoretical maximum speedup using multiple processors. The law is named after computer architect Gene Amdahl, and was presented at the AFIPS Spring Joint Computer Conference in 1967.

The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. For example, if a program needs 20 hours using a single processor core, and a particular portion of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (95%) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour. Hence the speedup is limited up to 20x.

106 questions
12
votes
1 answer

Python multiprocessing performance only improves with the square root of the number of cores used

I am attempting to implement multiprocessing in Python (Windows Server 2012) and am having trouble achieving the degree of performance improvement that I expect. In particular, for a set of tasks which are almost entirely independent, I would expect…
KPM
  • 331
  • 1
  • 13
9
votes
2 answers

Why isn't N independent calculations N times faster on N threads?

I have an N core processor ( 4 in my case ). Why isn't N totally independent function calls on N threads roughly N times faster ( of course there is an overhead of creating threads, but read further )? Look at the the following code: namespace ch =…
krispet krispet
  • 1,648
  • 1
  • 14
  • 25
8
votes
2 answers

Amdahl's law and GPU

I have a couple of doubts regarding the application of Amdahl's law with respect to GPUs. For instance, I have a kernel code that I have launched with a number of threads, say N. So,in the amdahl's law the number of processors will be N right? Also,…
Anirudh Kaushik
  • 181
  • 1
  • 11
7
votes
1 answer

Chapel-Python integration questions

I'm trying to see if I can use Chapel for writing parallel code for use in a Python-based climate model: https://github.com/CliMT/climt I don't have any experience with Chapel, but it seems very promising for my use-case. I had a few questions about…
7
votes
2 answers

How to find an optimum number of processes in GridSearchCV( ..., n_jobs = ... )?

I'm wondering, which is better to use with GridSearchCV( ..., n_jobs = ... ) to pick the best parameter set for a model, n_jobs = -1 or n_jobs with a big number, like n_jobs = 30 ? Based on Sklearn documentation: n_jobs = -1 means that the…
7
votes
2 answers

pathos: parallel processing options - Could someone explain the differences?

I am trying to run parallel processes under python (on ubuntu). I started using multiprocessing and it worked fine for simple examples. Then came the pickle error, and so I switched to pathos. I got a little confused with the different options and…
6
votes
2 answers

Poor scaling of multiprocessing Pool.map() on a list of large objects: How to achieve better parallel scaling in python?

Let us define : from multiprocessing import Pool import numpy as np def func(x): for i in range(1000): i**2 return 1 Notice that func() does something and it always returns a small number 1. Then, I compare an 8-core parallel…
6
votes
1 answer

An OpenCL code in MQL5 does not get distributed jobs to each GPU core

I have created a GPU based indicator for MetaTrader Terminal platform, using OpenCL and MQL5. I have tried hard that my [ MetaTrader Terminal: Strategy Tester ] optimization job must get transferred on GPU to maximum. Most of the calculations are…
Jaffer Wilson
  • 7,029
  • 10
  • 62
  • 139
5
votes
3 answers

Amdahl's Law examples

Amdahl's Law states that the maximal speedup of a computation where the fraction S of the computation must be done sequentially going from a 1 processor system to an N processor system is at most 1 / (S + [(1 - S) / N]) Does…
OTO
  • 51
  • 1
  • 2
5
votes
0 answers

cv::parallel_for_ not very big improvement

I'm testing the class cv::ParallelLoopBody for image processing code. I first started implementing the normalization, where I've to divide all the pixels with certain values for each channel, which is an easy nice parallelized code. However, when…
Ja_cpp
  • 2,426
  • 7
  • 27
  • 49
5
votes
2 answers

improving bigint write to disk performance

I am working with really large bigint numbers and I need to write them to disk and read them back later because they won't all fit in memory at a time. The current Chapel implementation first converts the bigint to a string and then writes that…
zx228
  • 93
  • 4
5
votes
2 answers

Expected speedup from embarrassingly parallel task using Python Multiprocessing

I'm learning to use Python's Multiprocessing package for embarrassingly parallel problems, so I wrote serial and parallel versions for determining the number of primes less than or equal to a natural number n. Based on what I read from a blog post…
4
votes
1 answer

Efficient collection and transfer of scattered sub-arrays in Chapel

Recently, I came across Chapel. I liked the examples given in the tutorials but many of them were embarrassingly parallel in my eyes. I'm working on Scattering Problems in Many-Body Quantum Physics and a common problem can be reduced to the…
4
votes
2 answers

CyclicDist goes slower on multiple locales

I tried doing an implementation of Matrix multiplication using CyclicDist module. When I test with one locale vs two locales, the one locale is much faster. Is it because the time to communicate between the two Jetson nano boards is really big or is…
4
votes
1 answer

Why does joblib.Parallel() take much more time than a non-paralleled computation? Shouldn't Parallel() run faster than a non-paralleled computation?

A joblib module provides a simple helper class to write parallel for loops using multiprocessing. This code uses a list comprehension to do the job : import time from math import sqrt from joblib import Parallel, delayed start_t =…
user11566345
1
2 3 4 5 6 7 8