0

This depends on the length of the string, so let's take 3 cases : the easiest case, the worst case and a middle one. All for a 32bit unsigned integer. The values will be 0, 4294967295 and 67295 (half length string).

Let's say it runs on a modern CPU, like a i7 Nehalem.

I want to show how CPU intensive this operation is with concrete numbers. The algorithm is typically a small loop where one iteration needs the result of the previous one, so the code will not take advantage of superscallar CPU optimizations.

Is there any hard wired instruction to perform this operation on modern CPU ?

EDIT : I tried to answer myself and done some search.

Using Agner Fog's 'Instruction Tables' and code from this answer

;parameters esi is a pointer to the string, ecx the length of the string
string_to_int:       ; 
xor ebx,ebx          ; clear ebx                    > 1,1
.next_digit:
movzx eax,byte[esi]  ;                              > 1,1
inc esi              ;                              > 1,1
sub al,'0'           ; convert from ASCII to number > 1,1
imul ebx,10          ;                              > 1,3
add ebx,eax          ; ebx = ebx*10 + eax           > 1,1
loop .next_digit     ; while (--ecx)                > 6
mov eax,ebx          ;                              > 1,1
ret

First and last instruction are run once. Sum of other latencies and execution is 18 per iteration. So the answer of the question should be 4+18*string.length.

  • "0" = 22 cycles
  • "4294967295" = 184 cycles
  • "67295" = 94 cycles

It is far less than I thought. This is just for conversion and does not count all operations needed to handle a string : copy from NIC buffer to ram, ram to CPU cache...

Am I counting the right things ? Is there any micro optimization expert who can tell me if this looks right ? (Mystical maybe ? ;) )

Community
  • 1
  • 1
bokan
  • 3,601
  • 2
  • 23
  • 38
  • http://stackoverflow.com/questions/20819206/8086-assembly-convert-input-string-to-integer – Cratylus Sep 17 '14 at 17:52
  • @Cratylus : there are lot of question about how to convert a string to int, but it does not count the CPU cycles (how many cyles for a sub, a mul, an add...) – bokan Sep 17 '14 at 17:57
  • 1
    Computers use bytes, strings are for humans. How long the conversion takes is irrelevant, it is always a *lot* faster than the required I/O. – Hans Passant Sep 17 '14 at 18:22
  • 1
    There are currently 4 different i7 base architectures (Nehalem, Sandy Bridge, Ivy Bridge, Haswell, not counting Haswell-E) and two more have been demonstrated (Broadwell, Skylake). You question generalizes too much. How to find out is easy. Measure it. – Marco van de Voort Sep 17 '14 at 18:23
  • 1
    @HansPassant Most of the conversions done today on webservers are for XML and JSON serialisation. Browser/webservice consumer <-> Front server <-> SQL Database. I wonder why this question deserve downvotes. – bokan Sep 17 '14 at 19:15
  • Sure. How do you think it got from one server to another? XML and JSON are popular because it doesn't matter at all that parsing them is not as efficient and it's *way* easier to troubleshoot than binary data. Don't we all love those really fast cores, not just good for games. – Hans Passant Sep 17 '14 at 19:21
  • It's a concern for me wasting CPU cycles and therefore energy (CO2) doing things that can be avoided. There are tons of communications protocols that allow human readable packets for dev and then switch to binary mode for production. My question here is to explain a customer why XML and JSON are stupid choice to transfer GBytes of raw geospatial data that became Hundreds GB of XML and JSON. – bokan Sep 17 '14 at 19:31
  • @HansPassant: Thinking this way leads some of my customers right in a performance wall. Just because they replaced binary communication by XML. Also I/O is way faster than conversion when data came through 10Gb NICs or SSD arrays. – bokan Sep 17 '14 at 21:57
  • except when you are mostly dealing with number string IOs, those conversions are negligible – phuclv Sep 18 '14 at 03:52
  • As has already been mentioned, you're better off doing actual measurements than trying to count clock cycles manually when you're dealing with modern CPU architectures. – Michael Sep 18 '14 at 07:47
  • Hans is correct about IO mattering more than parsing (even if you only look at cache misses alone). Given this, it'd make a lot more sense to ignore parsing and validation costs and look at input data size alone. E.g. an integer may cost 4 bytes (regardless of value) while a string costs a minimum of between 2 and 10 bytes (including some sort of separator). If small values (less than 4 digits) are more frequent, something like "comma separated values" can be more efficient than "binary". – Brendan Sep 18 '14 at 07:48
  • You are all very nice to care about all those questions I don't ask. My question here is "how many CPU cycle for a string to int conversion? " nothing else. Of course I/O, memory use, readability, bla bla bla etc... are important too, but it's not the question here. This question have been downvoted, I really wonder why. Is it an insult to question about CPU cycles on SO ? – bokan Sep 19 '14 at 19:41

2 Answers2

1

The algorithm you have to convert a string into an integer value is the generally accepted algorithm (32 bit). However, there are more than one algorithm to convert an integer value to a string (not to mention instruction sets, microarchitectures, cache sizes, etc...). Even if you limit every one of those things, there isn't any one single answer to that question.

Although, I think this may be a case of premature optimization. If I understand correctly, you're concerned with the additional overhead introduced by a non-binary protocol. Binary protocols are generally an extreme measure to take to increase performance. It's also generally done to limit latency, and not to increase throughput.

There are a lot of benefits to text protocols you're forced to give up in using binary protocols (compression characteristics, ease of use, ease to debug, etc...). You also need to consider that not all CPU architectures are little-endian (network byte-order is specifically big-endian). Make sure the protocol is the bottleneck before you optimize.

Jason
  • 3,777
  • 14
  • 27
0

Interpreting the contents of an XML file consumes copious amounts of CPU cycles. A large one will require CPU seconds or longer to parse. If you keep the large XML file as your database of values just finding the data will, on average, take millions of times longer than converting the data to this or that numeric format.

Ergo, (if possible) do a one-time conversion of the file to a vector, matrix or other structure of binary values you can easily index into. If indexing is not possible just finding the data in the vector will be a lot faster than doing the same in the XML file. In any case even in the one-time conversion the job of finding the data in the XML file will be much greater than converting once you've found it.

As to your question itself I would have guessed 10 (closer) to 15 cycles per loop. I have read that the loop[cond] instruction is a carryover from earlier processor generations that may have stopped being useful around when the Pentium was introduced. Assembly output from gcc confirms that it is seldom used. It has - as I understand it - to do with the instruction not being easily reordered and that it tests status flags which might not be available when the instruction starts being executed, leading to processor stalls. What its result will be (jump taken) should be fairly predictable though.

Olof Forshell
  • 3,169
  • 22
  • 28