Questions tagged [supercomputers]

Supercomputers belong to a class of highly specialised hardware infrastructures, where high number of machines are typically pre-organised and smart-linked together with specialised high-speed low-latency interconnects, so as to allow new forms of concurrent processing cooperations to be orchestrated. Having any such supercomputing infrastructure is not enough, it is important to also use system tools capable to harness the most of the available CPU-powers

Supercomputers first began to appear in the 1960's.

These early supercomputers had only a single, high-speed processor. Control Data Corporation's CDC-6600, designed by Seymour Cray, was about ten times faster than all other computers of its day, and was dubbed a supercomputer -- the first appearance of the term.

Later, as processing speed, cooling ability, and physical size hit limits, Cray pioneered the method of linking multiple processors together in order to get more speed out of the same machine. This is the same method used in today's supercomputers, which can range in size from thousands of processing cores to hundreds of thousands of processing cores.

*  Seymour CRAY (                           yes, the supercomputer guy )
*  said:
*  --------------------------------------------------------------------
*  A supercomputer turns compute-bound problems into I/O bound problems
*  --------------------------------------------------------------------
*  and:
*  --------------------------------------------------------------------
*  It is not hard to build a fast processor or a fast memory,
*  but the challenge is to build a fast system.
*  --------------------------------------------------------------------

Interconnect latency is an additional [TIME]-domain penalty, each process has to pay for using a supercomputer's remote resource under a distributed computation-graph schedule.

Minimising interconnect's latency-costs is thus one natural direction, using a smarter, overhead-aware computation-graph design is the other direction to achieve the indeed I/O-bounds' bleeding edge of the ultimate performance from any supercomputing system's infrastructure.

enter image description here

91 questions
24
votes
4 answers

What is the difference between a Cluster and MPP supercomputer architecture?

What is the difference between a Cluster and MPP supercomputer architecture?
Jimwalks
  • 315
  • 1
  • 2
  • 9
14
votes
8 answers

Raspberry Pi cluster, neuron networks and brain simulation

Since the RBPI (Raspberry Pi) has very low power consumption and very low production price, it means one could build a very big cluster with those. I'm not sure, but a cluster of 100000 RBPI would take little power and little room. Now I think it…
jokoon
  • 6,207
  • 11
  • 48
  • 85
6
votes
5 answers

Is this practical on a small supercomputer?

I'm investigating WEP and as part of that, I'm toying with the RC4 algorithm. I'm trying to decide if an inverse table is feasible to write (although large... I don't have the space and I don't intend to write one). For that, I've decided to check…
Ryan Amos
  • 5,422
  • 4
  • 36
  • 56
6
votes
3 answers

Efficiently computing floating-point arithmetic hundreds of thousands of times in Bash

Background I work for a research institute that studies storm surges computationally, and am attempting to automate some of the HPC commands using Bash. Currently, the process is we download the data from NOAA and create the command file manually,…
Jonathan E. Landrum
  • 2,748
  • 4
  • 30
  • 46
5
votes
2 answers

What is causing my random: "joblib.externals.loky.process_executor.TerminatedWorkerError" errors?

I'm making GIS-based data-analysis, where I calculate wide area nation wide prediction maps (e.g. weather maps etc.). Because my target area is very big (whole country) I am using supercomputers (Slurm) and parallelization to calculate the…
jjepsuomi
  • 4,223
  • 8
  • 46
  • 74
5
votes
1 answer

How to run several commands in one PBS job submission

I have written a code that takes only 1-4 cpus. But when I submit a job on the cluster, I have to take at least one node with 16 cores per job. So I want to run several simulations on each node with each job I submit. I was wondering if there is a…
solora
  • 93
  • 1
  • 6
4
votes
1 answer

Weird "Stale file handle, errno=116" on remote cluster after dozens of hours running

I'm now running a simulation code called CMAQ on a remote cluster. I first ran a benchmark test in serial to see the performance of the software. However, the job always runs for dozens of hours and then crashes with the following "Stale file…
Shangxin
  • 41
  • 1
  • 2
4
votes
2 answers

What is scratch space /filesystem in HPC

I am studying about HPC applications and Parallel Filesystems. I came across the term scratch space AND scratch filesystem. I cannot visualize where this scratch space exists. Is it on the compute node as a mounted filesystem /scratch or on the…
RootPhoenix
  • 1,626
  • 1
  • 22
  • 40
4
votes
2 answers

Is the PVM (parallel virtual machine) library widely used in HPC?

Has everyone migrated to MPI (message passing interface) or is PVM still widely used in supercomputers and HPC?
joemoe
  • 5,734
  • 10
  • 43
  • 60
4
votes
2 answers

PBS script -o file to multiple locations

Sometimes when I run jobs on a PBS cluster, I'd really like the joblog (-o file) in two places. One in the $PBS_O_WORKDIR for keeping everthing together and one ${HOME}/jobOuts/ for greping/awking/etc... Doing a test from the command line works with…
caddymob
  • 317
  • 1
  • 3
  • 10
3
votes
1 answer

How to compute the diameter of 3D torus interconnect?

A 3D torus interconnect is a network topology having p^3 nodes where p > 2. A 3D torus is basically a 3D mesh with links connecting nodes on opposite faces (Am I right?). The bisection width calculated by me comes out to be 2p^2. However, I am…
3
votes
4 answers

Orbital equations, and power required to run them

Due to a discussion on the SO IRC today, I'm curious about orbital mechanics, and The equations needed to solve orbital problems The computing power required to solve complex problems The question in particular is calculating when the Earth will…
Adam Davis
  • 91,931
  • 60
  • 264
  • 330
3
votes
0 answers

How can I remove '\r' in a loop to read files in python

Does anyone know how to get rid of '\r'? I have to run 89 files in the supercomputer, I have done so far the two next scripts The first one is .sh to run it on shell $ID=$(sed "${PBS_ARRAYID}q;d" ID_index.txt) $echo "Processing $ID" $python…
Ale Lope
  • 33
  • 3
3
votes
1 answer

Slurm: how many times will failed jobs be --requeue'd

I have a Slurm job array for which the job file includes a --requeue directive. Here is the full job file: #!/bin/bash #SBATCH --job-name=catsss #SBATCH --output=logs/cats.log #SBATCH --array=1-10000 #SBATCH --requeue #SBATCH…
duhaime
  • 25,611
  • 17
  • 169
  • 224
3
votes
0 answers

What is the best way to set this working environment for my research group?

We recently got a supercomputer (I will call it the "cluster", it has 4 GPUs and 12-core processor with some decent storage and RAM) to our lab for machine learning research. A Linux distro (most possibly CentOS or Ubuntu depending on your…
D_Serg
  • 464
  • 2
  • 12
1
2 3 4 5 6 7