You might want to have a look at loop unrolling.
When the body of a loop is short enough, checking the loop condition every iteration might be relatively expensive.
A specific and interesting way of implementing loop unrolling is called Duff's device: https://en.wikipedia.org/wiki/Duff%27s_device
Here's the version for your function:
inline uint64_t strtol_duff(char* s, int len)
{
uint64_t val = 0;
int n = (len + 7) / 8;
int i = 0;
switch (len % 8) {
case 0: do {
val = val * 10 + (*(s + i++) - '0');
case 7: val = val * 10 + (*(s + i++) - '0');
case 6: val = val * 10 + (*(s + i++) - '0');
case 5: val = val * 10 + (*(s + i++) - '0');
case 4: val = val * 10 + (*(s + i++) - '0');
case 3: val = val * 10 + (*(s + i++) - '0');
case 2: val = val * 10 + (*(s + i++) - '0');
case 1: val = val * 10 + (*(s + i++) - '0');
} while (--n > 0);
}
return val;
};
To be honest, in your case I believe you will not see a huge benefit because the loop's body is not that tiny. It's all very much system dependent and requires experimentation (like most optimizations).
Good compiler optimizers might unroll the loop automatically if it is actually beneficial.
But it's worth to try.