Questions tagged [cpu-architecture]

The hardware microarchitecture (x86, x86_64, ARM, ...) of a CPU or microcontroller.

The hardware architecture and ISA (x86, x86_64, ARM, ...) and the micro-architectural implementation of a CPU or microcontroller.

Some of the key architecture:

  • arm - 32-bit Advanced RISC Machine.
  • arm64 - 64-bit Advanced RISC Machine.
  • ia32 - 32-bit Intel Architecture.
  • mips - 32-bit Microprocessor.
  • mipsel - 64-bit Microprocessor.
  • ppc - PowerPC Architecture.
  • ppc64 - 64-bit PowerPC Architecture.

Use this tag for questions regarding features, bugs and details concerning the inner working of specific CPU architectures.

3996 questions
27072
votes
25 answers

Why is processing a sorted array faster than processing an unsorted array?

In this C++ code, sorting the data (before the timed region) makes the primary loop ~6x faster: #include #include #include int main() { // Generate data const unsigned arraySize = 32768; int…
GManNickG
  • 494,350
  • 52
  • 494
  • 543
696
votes
4 answers

How do I achieve the theoretical maximum of 4 FLOPs per cycle?

How can the theoretical peak performance of 4 floating point operations (double precision) per cycle be achieved on a modern x86-64 Intel CPU? As far as I understand it takes three cycles for an SSE add and five cycles for a mul to complete on most…
user1059432
  • 7,518
  • 3
  • 19
  • 16
344
votes
4 answers

Deoptimizing a program for the pipeline in Intel Sandybridge-family CPUs

I've been racking my brain for a week trying to complete this assignment and I'm hoping someone here can lead me toward the right path. Let me start with the instructor's instructions: Your assignment is the opposite of our first lab assignment,…
Cowmoogun
  • 2,507
  • 4
  • 12
  • 17
319
votes
7 answers

Why does this code execute more slowly after strength-reducing multiplications to loop-carried additions?

I was reading Agner Fog's optimization manuals, and I came across this example: double data[LEN]; void compute() { const double A = 1.1, B = 2.2, C = 3.3; int i; for(i=0; i
ttsiodras
  • 10,602
  • 6
  • 55
  • 71
278
votes
3 answers

What is a retpoline and how does it work?

In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel1 will be compiled with a new option, -mindirect-branch=thunk-extern introduced to gcc to perform indirect calls through a so-called…
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
263
votes
5 answers

How does the ARM architecture differ from x86?

Is the x86 Architecture specially designed to work with a keyboard while ARM expects to be mobile? What are the key differences between the two?
user1922878
  • 2,833
  • 3
  • 13
  • 7
257
votes
7 answers

Difference between core and processor

What is the difference between a core and a processor? I've already looked for it on Google, but I only get definitions for multi-core and multi-processor, which is not what I am looking for.
Saad Achemlal
  • 3,616
  • 5
  • 16
  • 17
249
votes
3 answers

How much of ‘What Every Programmer Should Know About Memory’ is still valid?

I am wondering how much of Ulrich Drepper's What Every Programmer Should Know About Memory from 2007 is still valid. Also I could not find a newer version than 1.0 or an errata. (Also in PDF form on Ulrich Drepper's own site:…
Framester
  • 33,341
  • 51
  • 130
  • 192
236
votes
4 answers

What is the purpose of the "Prefer 32-bit" setting in Visual Studio and how does it actually work?

It is unclear to me how the compiler will automatically know to compile for 64-bit when it needs to. How does it know when it can confidently target 32-bit? I am mainly curious about how the compiler knows which architecture to target when…
Aaron
  • 10,386
  • 13
  • 37
  • 53
217
votes
10 answers

What is the difference between Trap and Interrupt?

What is the difference between Trap and Interrupt? If the terminology is different for different systems, then what do they mean on x86?
David
  • 3,190
  • 8
  • 25
  • 31
209
votes
13 answers

Why is a boolean 1 byte and not 1 bit of size?

In C++, Why is a boolean 1 byte and not 1 bit of size? Why aren't there types like a 4-bit or 2-bit integers? I'm missing out the above things when writing an emulator for a CPU
Asm
  • 2,101
  • 2
  • 13
  • 4
190
votes
4 answers

What happens when a computer program runs?

I know the general theory but I can't fit in the details. I know that a program resides in the secondary memory of a computer. Once the program begins execution it is entirely copied to the RAM. Then the processor retrive a few instructions (it…
gaijinco
  • 2,146
  • 4
  • 17
  • 16
179
votes
2 answers

What is difference between sjlj vs dwarf vs seh?

I can't find enough information to decide which compiler should I use to compile my project. There are several programs on different computers simulating a process. On Linux, I'm using GCC. Everything is great. I can optimize code, it compiles fast…
sorush-r
  • 10,490
  • 17
  • 89
  • 173
175
votes
1 answer

Why is processing an unsorted array the same speed as processing a sorted array with modern x86-64 clang?

I discovered this popular ~9-year-old SO question and decided to double-check its outcomes. So, I have AMD Ryzen 9 5950X, clang++ 10 and Linux, I copy-pasted code from the question and here is what I got: Sorted - 0.549702s: ~/d/so_sorting_faster$…
DimanNe
  • 1,791
  • 3
  • 12
  • 19
170
votes
5 answers

Write-back vs Write-Through caching?

My understanding is that the main difference between the two methods is that in "write-through" method data is written to the main memory through the cache immediately, while in "write-back" data is written in a "later time". We still need to wait…
triple fault
  • 13,410
  • 8
  • 32
  • 45
1
2 3
99 100