0

Context: writing an assembler. Imagine you write code from scratch, with no toolchain support. You are alone with the computer in front of you and you get the cpu documentation. Back in the days.

When you need to write in binary, how do you know how to shape the bits at the lowest level?

For example convert #253 (can be any value up to 2^32, and could be hexadecimal as well) to a binary immediate. Am I obliged to parse it byte per byte from right to left, multiply by 10 (or 16 for hex) )everytime and store the result in the accumulating register?

Is this what assemblers do at the lowest level to form the stream of bits? (adding and shifting)?

Kroma
  • 1,109
  • 9
  • 18
  • Assemblers just translate source code into an output format ("Just" is an oversimplification here) so what is your real question? How to translate a string into a number? That's a basic task and yes, it is accomplished basically by adding and multiplying. Taking care of caveats like overflow, charset and charset encoding – Margaret Bloom Jan 04 '17 at 11:35
  • @MargaretBloom I am writing an assembler and my main concern is to write a stream of bits efficiently. I thought about a jump table as well, and hex conversion to speed that up. Decimal is very poor for optimizations. – Kroma Jan 04 '17 at 11:44
  • i'd propose you google for "compiler building tutorial", there are a lot of very good tutorials teaching the things you need, e.g. state machines for decoding values. Even assemblers are some sort of compilers, they compile assembly to machine code – Tommylee2k Jan 04 '17 at 11:47
  • Note that converting binary to hex "on the fly" is much easier than to decimal (and the other way round) - One of the reasons why it was introduced at all. – tofro Jan 04 '17 at 13:23

1 Answers1

3

Yes, you read in the characters 2, 5 and 3, one by one (or read many at once if that's what the hardware does and then look at them sequentially, one by one).

And then you can convert the ASCII characters into decimal digits (by subtracting the ASCII code of character 0, which is 48 decimal) and multiply the digits by the appropriate powers of 10 (or whatever the base is).

Doing it as ((0+2)*10+5)*10+3 is both easy and handy.

Multiplication by 10 does not need to involve actual multiplication, which may be a slow operation (or the CPU may not even have an instruction for multiplication), but can be carried out as a combination of shifts and adds, e.g. 2*10 = 2*8 + 2*2 = (2 shifted 3 positions left) + (2 shifted 1 position left).

If nobody has written the code to do this for you, you have to write it. Conventional CPUs don't contain logic for parsing and converting from text to numbers or the other way around. They only provide simple building blocks (relatively simple instructions), from which you can construct everything.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • Alexey, so at the end of the day, I have to create a jump table containing the proper values I want (in a loop to 256, increment a register by one, store its binary representation in that table at the proper offset); When a conversion is needed, load in a register and do whatever I need with that representation (create the registers, store them and so on). And problem solved? – Kroma Jan 04 '17 at 17:49
  • instead of binary representation, I meant binary value – Kroma Jan 04 '17 at 18:02
  • Sorry, I lost your train of thought. However, there are other interesting problems to solve when implementing an assembler. E.g. [this one](http://stackoverflow.com/q/41418521/968261). And it's more important to have a fast solution to that one than getting the maximum performance out of a numeric parser. – Alexey Frunze Jan 04 '17 at 19:51
  • For x86, optimal `total = total*10 + digit` takes two LEA instructions: `lea eax, [rax*4 + rax]` to scale by 5, and `lea eax, [rax*2 + rcx]` to do the remaining `*2` and add the digit from ECX. (After already doing `c -= '0'` and checking for being a decimal digit, also since doing the `-'0'` as part of a 3-component LEA has extra latency on most CPUs) [NASM Assembly convert input to integer?](https://stackoverflow.com/a/49548057) (My answer there also has links to converting the other way, int -> string). – Peter Cordes Mar 22 '22 at 20:57