Writing binary data in c++

Question

I am in the process of building an assembler for a rather unusual machine that me and a few other people are building. This machine takes 18 bit instructions, and I am writing the assembler in C++.

I have collected all of the instructions into a vector of 32 bit unsigned integers, none of which is any larger than what can be represented with an 18 bit unsigned number.

However, there does not appear to be any way (as far as I can tell) to output such an unusual number of bits to a binary file in C++, can anyone help me with this.

(I would also be willing to use C's stdio and File structures. However there still does not appear to be any way to output such an arbitrary amount of bits).

Thank you for your help.

Edit: It looks like I didn't specify how the instructions will be stored in memory well enough.

Instructions are contiguous in memory. Say the instructions start at location 0 in memory:

The first instruction will be at 0. The second instruction will be at 18, the third instruction will be at 36, and so on.

There is no gaps, or no padding in the instructions. There can be a few superfluous 0s at the end of the program if needed.

The machine uses big endian instructions. So an instruction stored as 3 should map to: 000000000000000011

Files are byte oriented. If you have more bits (you have 18) than your byte, then things do get hairy. Which bits come first (endianness), and what do you do if 18 modulo (the number of bits in a byte) is not 0? (Do you leave spare bits as 0, or something else?) How many bits are there in your byte, from the point of view of the C++ program doing the output? (I never thought I'd get to ask that!) (Is the C++ program outputting to something that uses 8-bit bytes?) Basically, in what format does your machine expect instructions? — Thanatos, Oct 16 '11 at 22:09
The bytes are stored big endian. The machine doesn't really use the concept of 'bytes' per say. All data types are 18 bits, so I guess you could say one byte is 18 bits on this machine. Each instruction is 18 bits, followed immediately after the next instruction with no padding. If needed on the machine that is running the assembler (an x86-64 bit machine, using ext4 for the filesystem), we can pad the end of the file with 0s. — Leif Andersen, Oct 16 '11 at 22:21
You have 18-bit "instructions" then, but you're outputting them on a x86-64, where bytes are 8-bits wide. What do we do about the disparity between these? Does the start of an instruction align (in the file you're outputting) with a byte boundary, or are they packed tightly, with no padding? — Thanatos, Oct 16 '11 at 22:28
They are packed tightly with no packing. If, after writing out all of the instructions, there are dangling bits (that don't fit evenly into a byte), I can simply pad them with 0s. Ideally, there wouldn't really be any notion of an 8 bit byte in this machine, but we do need to assemble the code on a more conventional machine. As such, there should be no padding in between instructions, even though they don't fit into the normal byte boundaries). (Aka, it would be nice to just treat it as an array of bits). — Leif Andersen, Oct 16 '11 at 22:42
Load it into sram using the xilinx tool. I haven't actually done it yet as I currently don't have anything to load, but it takes a binary file. — Leif Andersen, Oct 17 '11 at 04:49

score 3 · Accepted Answer · answered Oct 16 '11 at 22:26

Keep an eight-bit accumulator.
Shift bits from the current instruction into to the accumulator until either:
- The accumulator is full; or
- No bits remain of the current instruction.
Whenever the accumulator is full:
- Write its contents to the file and clear it.
Whenever no bits remain of the current instruction:
- Move to the next instruction.
When no instructions remain:
- Shift zeros into the accumulator until it is full.
- Write its contents.
- End.

For n instructions, this will leave (8 - 18n mod 8) zero bits after the last instruction.

score 2 · Answer 2 · edited May 23 '17 at 11:51

2

You could maybe represent your data in a bitset and then write the bitset to a file. Wouldn't work with fstreams write function, but there is a way that is described here...

edited May 23 '17 at 11:51

Community

1
1

answered Oct 16 '11 at 22:12

niktehpui

560
3
15

That's the crux of the issue though: how do you write that bitset to the file? Since the OP hasn't specified what format his unusual machine uses, it's hard to say. – Thanatos Oct 16 '11 at 22:14

score 2 · Answer 3 · answered Oct 16 '11 at 22:13

There are a lot of ways you can achieve the same end result (I am assuming the end result is a tight packing of these 18 bits).

A simple method would be to create a bit-packer class that accepts the 32-bit words, and generates a buffer that packs the 18-bit words from each entry. The class would need to do some bit shifting, but I don't expect it to be particularly difficult. The last byte can have a few zero bits at the end if the original vector length is not a multiple of 4. Once you give all your words to this class, you can get a packed data buffer, and write it to a file.

Thanatos · Answer 4 · 2011-10-16T22:32:33.467

The short answer: Your C++ program should output the 18-bit values in the format expected by your unusual machine.

We need more information, specifically, that format that your "unusual machine" expects, or more precisely, the format that your assembler should be outputting. Once you understand what the format of the output that you're generating is, the answer should be straightforward.

One possible format — I'm making things up here — is that we could take two of your 18-bit instructions:

         instruction 1       instruction 2     ...
       MSB            LSB  MSB            LSB  ...
bits → ABCDEFGHIJKLMNOPQR  abcdefghijklmnopqr  ...

...and write them in an 8-bits/byte file thus:

KLMNOPQR CDEFGHIJ 000000AB klmnopqr cdefghij 000000ab ...

...this is basically arranging the values in "little-endian" form, with 6 zero bits padding the 18-bit values out to 24 bits.

But I'm assuming: the padding, the little-endianness, the number of bits / byte, etc. Without more information, it's hard to say if this answer is even remotely near correct, or if it is exactly what you want.

Another possibility is a tight packing:

ABCDEFGH IJKLMNOP QRabcdef ghijklmn opqr0000

or

ABCDEFGH IJKLMNOP abcdefQR ghijklmn 0000opqr

...but I've made assumptions about where the corner cases go here.

score 0 · Answer 5 · answered Oct 16 '11 at 22:42

Just output them to the file as 32 bit unsigned integers, just as you have in memory, with the endianness that you prefer.

And then, when the loader / eeprom writer / JTAG or whatever method you use to send the code to the machine, for each 32 bit word that is read, just omit the 14 more significant bits and send the real 18 bits to the target.

Unless, of course, you have written a FAT driver for your machine...

Writing binary data in c++

5 Answers5