0

Whenever I see code like this:

void test() {
    u8 x[] = {0, 22, 43};
}

I'm imagining a block of memory stored in ram with the following layout:

suppose x started at address 0x00.

0x00    00000000
0x01    00010110
0x02    00101011

so that at address 0x00 we have the binary of 0, at address 0x01 we have the binary of 22, and we have the binary of 43 at address 0x02.

But does that imply ram stores really 8 bits for every memory address?

Or is it structured a bit like this, with a cell width of say 32 bit:

0x00    00000000 00000000 00000000 00000000
0x01    00000000 00000000 00000000 00010110
0x02    00000000 00000000 00000000 00101011

Is it electronically implemented like that? I'm really referring to modern ram implementations, no those simpler 8 bit versions on old computers.

Take a look at this other instance:

void test() {
    u16 y[] = { 0, 32, 11 };
}

Now every element is 16 bit wide. What tricks me is that I can cast it to a u8 pointer and literally access every byte of every 16bit element in the array. So I'm imagining the memory layout for this is:

0x00    00000000
0x01    00000000

0x02    00100000
0x03    00000000

0x04    00001011
0x05    00000000

or is it a bit like this:

0x00   00000000 00000000 00000000 00000000
0x02   00000000 00000000 00000000 00100000
0x04   00000000 00000000 00000000 00001011

Where at address 0x00 you in reality can access the address 0x01 from 0x00 itself. Can someone help me clarify how DRAM is internally structured? (Nothing too complicated just a high-level view for me to understand what's the right way to think about it).

Does this have some correlation to the fact that CPUs want addresses to be aligned to multiples of two? I don't know where I heard that, but I'm really curious how this all works.

Modification of this:

I literally have no idea what protocol it is used for DRAM and CPU to communicate, I'm always assuming there's an address line of 32 bit for the address, and another 32 bit data line to read/write data from/to.

So in the second example where I showed the u16 array, to read from RAM you have to set the address lines to address 0x00, then read all at once the 16 bits from the data lines. But, if you again cast the data as a u8 pointer and read it, then you can still say to the code to read at address 0x00, and get the first byte of the first 16 bit element of the array, then at address 0x01 read the second byte of the first 16 bit element, and so you would put address 0x00 on the address line for both readings and the extrapolate from the data line the byte by shifting? I've never understood that.

JaMiT
  • 14,422
  • 4
  • 15
  • 31
gmmk
  • 69
  • 5
  • 2
    It doesn't really matter. Every byte is addressable. – NathanOliver Jul 04 '23 at 19:19
  • So I shoulnd't really care about that? – gmmk Jul 04 '23 at 19:20
  • 1
    In most cases no, unless you really have a lot of data and you need to optimize your datastructures to use every last byte of memory. – Pepijn Kramer Jul 04 '23 at 19:22
  • Yet I would like to know for example why modern CPUs have to have addresses aligned to multiples of 2 to efficiently store or read data from ram – gmmk Jul 04 '23 at 19:25
  • binary is a base two system. That makes powers of two easy numbers to work with. – NathanOliver Jul 04 '23 at 19:27
  • "Multiples of two" are not "powers of two". – gmmk Jul 04 '23 at 19:28
  • cache lines are aligned to address boundaries, therefore aligned reads are quicker & simpler because values wont overlap multiple cache lines: https://stackoverflow.com/questions/2006216/why-is-data-structure-alignment-important-for-performance – Alan Birtles Jul 04 '23 at 19:28
  • Ok, so basically there is not a one-to-one map of say 32-bit addresses with the actual address of where the data resides in RAM, and every RAM cell can store up to whatever the size of a cache line is, say 32bit? So there has to be some addressing logic to know where exactly to read data from right? – gmmk Jul 04 '23 at 19:31
  • 3
    It's not the simple. There is a thing called virtual memory which uses the disk drive as an extension of ram. Then there is address layout randomization done by the OS to prevent processes from interfering with other processes. – NathanOliver Jul 04 '23 at 19:34
  • I literally have no idea what protocol it is used for DRAM and CPU to communicate, I'm always assuming there's an address line of 32 bit for the address, and another 32 bit data line to read/write data from/to. – gmmk Jul 04 '23 at 19:35
  • Take a look at this lecture https://www.youtube.com/watch?v=CDOOxhRBMIY and/or other lectures listed here https://safari.ethz.ch/digitaltechnik/spring2023/doku.php?id=schedule. It should answer all your questions – idmean Jul 04 '23 at 19:38
  • @gmmk what do you mean? For example [DDR4](https://en.wikipedia.org/wiki/DDR4_SDRAM) is the name of the communication standard. You can even read on the wiki about encoding. – freakish Jul 04 '23 at 19:40
  • @idmean Thanks, I'll take a look at that then. – gmmk Jul 04 '23 at 19:40
  • Oh ok so it is as though they've defined an 'interface' of how things have to work, then they're like: Programmer, you don't have to care about how it is implemented, just know it does this and that, no matter how it is actually implemented. – gmmk Jul 04 '23 at 19:47
  • Physically every bit of a word may go to a separate DRAM IC. Sequential data lines and address lines from CPU point of view may be reordered from DRAM point of view. As a software developer you shouldn't care about this. – dimich Jul 04 '23 at 21:13
  • 2
    It looks like you are asking a hardware question (or two) instead of a software question (i.e. "Not about programming or software development"). – JaMiT Jul 04 '23 at 21:18
  • This question should be migrated to https://electronics.stackexchange.com/ and reopened. – Gabriel Staples Jul 04 '23 at 23:02

1 Answers1

3

At least for the moment, I'm going to discuss how things are structured on a typical PC server/desktop/tower/laptop.

Most of the time, your code is affected primarily by the structure of cache.

Ignoring that for the moment, with normal DRAM, you split the address into (at least) two parts: a row and a column (specifically to reduce the number of address lines from the CPU to memory). There may than just that but least a row and column.

At least from a logical viewpoint, the words the CPU reads from/writes to memory are 64 bits wide. The bus itself may be 64 bits wide, or (for example) 8 bits wide, with a little bit of logic to break a 64-bit chunk into pieces at one and, and reassemble them at the other. A typical DDR4 DIMM will have 16 chips, each providing 4 of the total of 64 bits. So, reads from/writes to DRAM are normally in a minimum of 64-bit "chunks" (though there are lines to say which bytes are valid).

In reality, they're normally larger than that though. When your code tries to read an address, that read goes to the cache. If it's in the cache, you just read data from the cache without touching DRAM at all. It only gets read from memory if it's not in the cache. In this case, the cache controller makes some space in the cache (by flushing a line of cache to memory) then reads an entire line of cache in from DRAM to the CPU. Because of this, reading or writing DRAM is normally done in bursts--the CPU gives the address of the beginning of a cache line, and then reads/writes as many 64-bit words as needed for an entire cache line without sending another address.

So, when you're reading or writing DRAM, it mostly happens in terms of entire cache lines, not individual bytes or words.

When we consider the structure of the DRAM itself, things (if anything) get even a little stranger still. As I already mentioned, the ram is addressed in rows and columns. But that goes well beyond just addressing--the memory is structured in rows and columns. Over and above that, it's structured into the actual memory cells, and what are called "sense amps" along the edge of the array of cells. There are really two parts to the sense amps. The first part is a comparator. When you read a bit from the cell, it compares the voltage to a reference. If it's lower, that's a 0. If it's higher, that's a 1. The output from each comparator is then stored in a flip flop. So, when you read data from the DRAM, you get an initial amount of time to read data from the actual DRAM cells into the sense amplifiers, then it transmits the data from the sense amps to the outside world.

For a typical DRAM, you might have a thousand or more bits of sense amps, so when you issue it one address, it reads a few thousand bits from the memory array into the sense amps. Then it can transmit all the data from the sense amps to the outside world quite quickly. This adds more motivation to read in bursts--you get an access pattern something like 6-1-1-1-1-1-1-1. That is, it takes six clocks to retrieve the first word, and 1 each for the next 7 words in a row. This gives even more motivation to read or write bursts of memory instead of reading/writing single word at a time--those subsequent words much less expensive than the first one.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111