19

As far as I know the smallest unit in C is a byte. Where does this constraint comes from? CPU?

For example, how can I write a nibble or a single bit to a file?

woliveirajr
  • 9,433
  • 1
  • 39
  • 49
cyraxjoe
  • 5,661
  • 3
  • 28
  • 42
  • 3
    I assume you are trying to write to disk. Most filesystems store everything in blocks. For example, on Linux with ext3, the blocksize is 4 kb, so anything less than that will waste the rest of the block size. – beatgammit Jul 14 '11 at 23:41

3 Answers3

23

no, you can't... files are organized in bytes, it's the smallest piece of data you can save.

And, actually, that 1 byte will occupy more than 1 byte of space, in general. Depending on the OS, the system file type, etc, everything you save as a file will use at least one block. And the block's size varies according to the file system you're using. Then, this 1-bit will be written as 1 - byte and can occupy as much as 4kB of your disk.

In wikipedia you can read something about the byte being the smallest data unit in many computers.

woliveirajr
  • 9,433
  • 1
  • 39
  • 49
9

Actually, it's a char--byte is not a standard C type.

The constraint comes from the C standard and is tautological: char is the smallest complete type in C because it is defined as such, and the sizes of all other types are defined as multiples of the size of char (whose size is always 1.)

Now, the number of bits in a char can vary from platform to platform. The number of bits tends to ultimately be hardware-defined, though most systems these days use 8-bit chars. char is supposed to represent the smallest addressable unit of memory (again, by definition.)

Jonathan Grynspan
  • 43,286
  • 8
  • 74
  • 104
  • 2
    We're getting into semantics and possibly pedantry, but the historical definition of a byte was the smallest complete type that the CPU architecture could address, making it synonymous with a C char and similarly variable in bits. I'm not sure if that definition has held though; RISC machines that can access only word-aligned values versus the need for compact strings and the ability to move data between machines has muddied things, I think. – Tommy Jul 14 '11 at 23:50
  • 2
    Yes, they're historically synonymous--it's just that the C type is `char`, and the OP was asking about C. – Jonathan Grynspan Jul 15 '11 at 01:15
  • @Tommy: *Most* RISC machines have byte-addressable memory and do have byte load/store instructions. The only notable modern exception was early DEC Alpha, which had byte-addressable memory but *only* aligned word (4 byte) / double-word (8 byte) load/store instructions. Access to a single byte was only possible via software read-modify-write of the containing word, which wouldn't be atomic unless you used LL/SC. A C11 / C++11 implementation on Alpha would presumably use CHAR_BIT = 32. [Can modern x86 hardware not store a single byte to memory?](https://stackoverflow.com/q/46721075) – Peter Cordes Mar 28 '20 at 18:59
  • But C / C++ on RISCs like MIPS and RISC-V have no problem with `CHAR_BIT = 8` because byte load/store instructions don't architecturally disturb surrounding bytes. No need to do an aligned-word load when all you want is a byte. (Even Alpha added byte load/store in later revions.) There are some modern DSPs with word-addressable memory where CHAR_BIT has to be 24 or 32; the individual bytes of a word don't even have unique addresses. – Peter Cordes Mar 28 '20 at 19:01
4

Moreover data is written to files in sectors (e.g. 512 bytes or so). And if we need to change only one byte the whole sector is read, patched and written back.

But you don't need to thinkabout sectors. To Change one bit just seek to apropriate byte position in file, read that byte, change the bit and write the result back.

Mike Mozhaev
  • 2,367
  • 14
  • 13