1

I've been reading about data-oriented programming in the context of Entity Component Systems. Apparently, using a struct of arrays can more effectively leverage the cache and give substantial performance increases. Basically, if all of the data you're iterating over is contiguous, then cache locality is leveraged to give a large performance increase.

Because I'll be working in Javascript, I figured I'd first devise a small little benchmark to see how much of a performance increase is possible under ideal conditions. I made it very simple. In the first test I benchmark the speed of iterating over an array of structs, and in the second I benchmark the speed of iterating over a struct of arrays.

Here is the code:

function randomInt() { return Math.floor(Math.random() * 100) + 1; }
function randomStr() { return Math.random().toString(36).substring(7); }

let samples = 1000;
let count = 10000000;

function benchmarkArrayOfStructs() {
  let AOS = [];

  for (let i = 0; i < count; i++) {
    AOS.push({ health: randomInt(), name: randomStr(), damage: randomInt() });
  }

  let t1 = performance.now();
  
  let sum = 0;

  for (let x = 0; x < samples; x++) {
    for (let i = 0; i < AOS.length; i++) {
      let item = AOS[i];
    
      sum += item.health + item.damage;
    }
  }
    
  console.log(performance.now() - t1);
}

function benchmarkStructOfArrays() {
  let SOA = { health: [], name: [], damage: [] }

  for (let i = 0; i < count; i++) {
    SOA.health.push(randomInt());
    SOA.name.push(randomStr());
    SOA.damage.push(randomInt());
  }

  let t2 = performance.now();
  
  let sum = 0;

  let h = SOA.health;
  let d = SOA.damage;

  for (let x = 0; x < samples; x++) {
    for (let i = 0; i < count; i++) {  
      sum += h[i] + d[i];
    }
  }

  console.log(performance.now() - t2);
}

benchmarkArrayOfStructs();
benchmarkStructOfArrays();

Interestingly, the latter solution is only 20% or so faster than the first solution. In the various talks that I've watched, they've claimed 10x speed increases for this type of operation. Also, intuitively I feel as if the latter solution should be much faster, but it isn't. Now I'm beginning to wonder if this sort of optimization is even worth integrating into my project, as it severely decreases ergonomics. Have I done something wrong in my benchmark, or is this the actual expected speedup?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Ryan Peschel
  • 11,087
  • 19
  • 74
  • 136
  • Synthetic benchmarks are rarely a good guide to what will happen with your real code in real situations, JavaScript engines apply different optimization strategies (or don't apply any at all) based on how the code is used. The benchmark above is quite simplistic (you'd want to do lots of data retrieval tests, not just one), but that's not the primary problem. The primary problem is that it's a synthetic benchmark in the first place. – T.J. Crowder Jan 24 '21 at 14:26
  • I'm aware of the pitfalls of benchmarking in general, but this one seems simple enough to be explanatory. Do you have any advice on how to improve the benchmark? – Ryan Peschel Jan 24 '21 at 14:27
  • As I said, at a minimum you'd repeat the test (within the test, not separate runs) -- a **lot**. (Like, 10,000+ times.) But are you having a significant performance problem with your project's current code? – T.J. Crowder Jan 24 '21 at 14:28
  • I'll attempt to modify the code example to do such. There is also no current code to speak. I'm attempting to design a minimal Entity Component System in Javascript for a project, and I'm using this to determine whether or not it's worth it to implement it using a Struct of Arrays versus an Array of Structs. The latter is significantly more ergonomic, so I'd prefer to do such, but if the former is as performant as others claim, I may have to sacrifice ergonomics. I'm trying to determine if it's worth it. – Ryan Peschel Jan 24 '21 at 14:30
  • I've updated the code to take into account samples, but it is still the same percentage difference. Regardless of whether there's 1 or 10,000 samples, the latter is 20% faster. – Ryan Peschel Jan 24 '21 at 14:40
  • I meant repeating *using* the data, not repeating building it -- but do whatever your project code will do. Also be sure to try changing which test is first, as these things are sensitive to that kind thing. But again, fundamentally I wouldn't try a synthetic benchmark at all, because they're inherently flawed. – T.J. Crowder Jan 24 '21 at 14:51
  • If there basically isn't any code in your real project yet, then you're trying to solve a problem you don't have. Build the code the way you think it should be and deal with it **if** you have a performance problem. – T.J. Crowder Jan 24 '21 at 14:52
  • I feel as if completely rewriting your entire game engine _after_ you've noticed a performance problem is a nightmare waiting to happen. Performance has historically been a concern in the games I've developed, so I'm looking to properly design this prior to it being an issue. Also, the updated benchmark _is_ repeatedly using it, not repeatedly building it. – Ryan Peschel Jan 24 '21 at 14:54
  • Sure enough, it does, sorry about that. I do still suggest (if you persist with benchmarking this) that you mix them up and look at how high-quality benchmarking tools work. [This post](https://stackoverflow.com/a/513259/157247) is regarding Java but most of the tips there apply to modern JavaScript engines as well (there's a lot of cross-pollination). Regarding re-writing: **Obviously** you pay attention as you go for early signs of trouble. Anyway, good luck with things! Hope the game engine goes well. – T.J. Crowder Jan 24 '21 at 15:03
  • JavaScript isn't going to auto-vectorize with SIMD while JITting. That's one of the biggest advantages that an SoA layout allows, but you aren't taking advantage of it. Also, if your code is the only thread running on an otherwise-idle *desktop* machine, you have much more memory bandwidth available to your thread than on a typical server, or on a busy machine with all cores competing for memory access. (Intel Xeons have lower max per-core DRAM memory bandwidth due to higher latency interconnects, but higher aggregate bandwidth with all cores busy. That's assuming you miss in private L2) – Peter Cordes Jan 24 '21 at 18:48

2 Answers2

1

JavaScript isn't going to auto-vectorize with SIMD while JITting. That's one of the biggest advantages that an SoA layout allows, but you aren't taking advantage of it. (And AFAIK can't easily in JS.)

Also, if your code is the only thread running on an otherwise-idle desktop machine, you have much more memory bandwidth available to your thread than on a typical server, or on a busy machine with all cores competing for memory access. (Intel Xeons have lower max per-core DRAM memory bandwidth due to higher latency interconnects, but higher aggregate bandwidth with all cores busy. That's assuming you miss in private L2 cache.) So your benchmark probably tested a case where you have lots of excess memory bandwidth.

You might also see more benefit from SoA if your objects are bigger. Your AoS loop is reading 2 of the 3 objects from every array element, so only a small part of the data brought into cache is "wasted". If you try with more fields that your loop doesn't use, you may see a bigger advantage.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

In Javascript, there is no guarantee that an Array is a true array - for all you know the engine could be storing elements all over memory, nullifying cache locality. To force the engine to store an array in contiguous memory, you must use a TypedArray.

TypedArrays are also intrinsically faster than regular arrays since the JS engine doesn't have to do any guesswork whatsoever about usage patterns.

EnderShadow8
  • 788
  • 6
  • 17