Questions tagged [superscalar]

8 questions
8
votes
1 answer

Hyperthreading vs. Superscalar execution

Imagine a CPU (or core) that is superscalar (multiple execution units) and also has hyperthreading (SMT) support. Why is the number of software threads the CPU can truly execute in parallel typically given by the number of logical cores (i.e.…
AdmiralAdama
  • 177
  • 1
  • 9
4
votes
1 answer

ARM Cortex-M7 assembly timing on simple delay loop - how to explain results?

Since AFAIK cycle timings are not published, I've decided to try to measure cycle count using DWT counter on STM32H750-DK; as a first example, I'm measuring a simple delay loop. It seems that two instructions can be executed by the Cortex-M7 in each…
user2064070
  • 305
  • 3
  • 13
2
votes
0 answers

Six stage pipelining with superscalar processor with two execution units

Need help in designing a six-stage pipelining with superscalar processor with two execution units. Six stages are Instruction Fetch (IF), Instruction Decode (ID), Read from Registers (RR), 2-cycle Execution (EX), Write back result (WB). Instructions…
Dr. Debasish Jana
  • 6,980
  • 4
  • 30
  • 69
2
votes
0 answers

Why are name dependencies (WaR, WaW) in ILP architectures problematic?

Assume the following two instructions are executed simultaneously: addi $t0, $t1, 4 addi $t1, $t2, 4 It's an anti-dependence, or Write-after-Read. Assuming they are executed at the same time, wouldn't the first instruction still read the correct…
1
vote
1 answer

Relation between CPI and number of execution units when looking at SIMD intrinsics

I understand that the term Cycle Per Instruction closely relates to the superscalarity of the processor, a term which I have not fully understood. According to Wikipedia, "...a superscalar processor can execute more than one instruction during a…
1
vote
2 answers

Interpreting Absurdly-Low Measured Latency in Careful Profile (Superscalarity Effects?)

I've written some code for profiling small functions. At the high level it: Sets the thread affinity to only one core and the thread priority to maximum. Computes statistics from doing the following 100 times: Estimate the latency of a function…
geometrian
  • 14,775
  • 10
  • 56
  • 132
0
votes
0 answers

Odd Style for Instruction Parallelism

final long s0 = this.s0; long s1 = this.s1; final long result = s0 + s1; s1 ^= s0; this.s0 = Long.rotateLeft(s0, 24) ^ s1 ^ s1 << 16; this.s1 = Long.rotateLeft(s1, 37); return result; Does…
0
votes
1 answer

Super-scaling vs Pipe-lining Performance

While a super-scalar CPU is typically also pipe-lined. Why pipe-lining and super-scalar execution are considered different performance enhancement techniques??