On x64, how does the Linux kernel access the data segment? Does it use -mcmodel=large during compilation?

Question

I'm writing a minimal x86-64 kernel from scratch and I am having some design issues.

From the comments and the link provided by stark I decided to rephrase my question. I want to take example on the Linux kernel to design my own kernel and would like some advice.

I know that, when C++ code is compiled it will use by default RIP-relative addressing to access the data segment of the executable (for all global/static variables). RIP-relative addressing is limited to a 32 bits offset which leaves with a maximum of 2GB offset from the code segment.

I also know (from stark's comment) that the Linux kernel starts its code segment at 0xffff_ffff_8000_0000 (https://www.kernel.org/doc/html/latest/x86/x86_64/mm.html):

ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0

If the code segment of the Linux kernel is further than 2GB from most of its data segment, how does it access it otherwise than with RIP-relative addressing?

I think that the -mcmodel=kernel code model can sign extend a 32 bits absolute address to 64 bits which allows the executable to access the upper 2GB of the virtual address space without using -mcmodel=large. That doesn't help since the data segment of the kernel is not found in that region. Meanwhile, the -mcmodel=large makes the executable access the data segment with a 64 bits absolute address which slows down the kernel and makes it much bigger.

How does the Linux kernel access the data segment and does it use a large code model to access the 0xffff_8000_0000_0000 region of the virtual address space?

How do you figure? Current implementations support up to 258 TB of RAM. It's a bit difficult to show where your calculations and assumptions went wrong since you didn't show them. — DevSolar, Jan 07 '22 at 15:42
Here's the answer from the site for questions about Unix and Linux: https://unix.stackexchange.com/questions/509607/how-a-64-bit-process-virtual-address-space-is-divided-in-linux — stark, Jan 07 '22 at 15:54
Well, it is easy to calculate that the 2GB of kernel space leaves basically 1+1+(512*512) page tables. Which comes down to 2GB. Basically, the first 1 is for the PML4 than the second is for PGD (which maps 512GB). Then, you add 512*512 page tables because you have 512 PD which each map 1GB and for each pd you have 512 pt which maps 2MB. Or something like that. I don't remember well the calculation. It could be higher than 128GB but definitely not 258TB. Maybe around 1024GB or so. — user123, Jan 07 '22 at 15:54
@stark This doesn't answer the question. I'm running a Linux kernel currently and this is not the map of my current kernel. Basically, if the Linux kernel is compiled with -mcmodel=large, gcc compiles with 64 bits absolute addresses which leaves the kernel with slower memory accesses and such. If the kernel isn't compiled with -mcmodel=large it can use -mcmodel=kernel which sign extends the 32 bits absolute addresses to 64 bits but leaves the kernel with 2GB of the upper portion of virtual memory. My question is more about that than the actual layout of the memory. — user123, Jan 07 '22 at 15:59
Ok, if you take, for example, a page table that's found in the beginning of 0xffff_8000_0000_0000. Then, the code is at ffff_ffff_8000_0000 (because this is where my kernel resides currently). How does the code access this page table? It cannot use RIP-Relative addressing because this is much further than 2GB and RIP-relative is limited to 32bits offsets. It also cannot use a 32 bits sign extended address so it must use a 64 bits absolute address. I'm wondering if this is what the Linux kernel is doing or otherwise how does it achieve accessing the 0xffff_8000_0000_0000 region. — user123, Jan 07 '22 at 16:19
"Easy to calculate", but you calculated wrongly. One page is 4 MB, for starters. You *can* still use 4 KB pages, but no-one does these days for obvious reasons. I suppose you re-read the available documentation on paging again, and perhaps come up with a more precise question than "how does it work". — DevSolar, Jan 07 '22 at 16:23
My current Linux kernel uses 4KB pages. Just run ```getconf PAGESIZE``` on any Linux machine and it will return 4096. — user123, Jan 07 '22 at 16:26
It is not a "how does it work" question. Maybe you wonder "how does it work" because you just said that pages are 4MB and its plain and simple wrong and you probably don't understand anything about memory models and memory addressing. Maybe you should simply leave the question for someone else to answer because you simply don't understand it. — user123, Jan 07 '22 at 16:28
You know why PAGESIZE is a queryable value? Because it can be **configured**. To be exact, it's the setting of the PS bit (#7) in the Page Directory Entry, which selects between 4 KB / page (unset) and 4 MB / page (set). You know how I know that? Because I've got the Intel Hardware Manuals here on my bookshelf, and was not afraid to look it up. You, on the other hand, come here with a lot of assumptions, and start mouthing off at people when given advice. That's not good behavior for people looking for information. — DevSolar, Jan 07 '22 at 16:37
All in all, memory management is defined by the CPU/MMU architecture, not what kernel XY does or does not do. x86_64 supports 48 bits of addressing, and whether the kernel is at the top of that or not makes no difference. I suggest (again, and concluding) that you refer to the readily available information on the x86_64 CPU architecture. — DevSolar, Jan 07 '22 at 16:41
Most people know (and you don't seem to), that the Linux kernel uses 4KB pages on all major distributions of Linux. Even though configurable, 4MB pages are a very bad idea because the granularity of allocation becomes very big and basically you are left with a minimum of 4MB allocated to your user mode process with the standard library implementation having no choice but to manage this page size. Most processes (like a small program) take up like less than one page or so. — user123, Jan 07 '22 at 16:41
This is why the Linux kernel compiles by default with 4KB pages. — user123, Jan 07 '22 at 16:42
"All in all, memory management is defined by the CPU/MMU architecture, not what kernel XY does or does not do. x86_64 supports 48 bits of addressing, and whether the kernel is at the top of that or not makes no difference." It makes a big difference to me because I am currently writing a kernel and would like advice on how to compile my code and with what code model. — user123, Jan 07 '22 at 16:43
If you are writing your own kernel, what **Linux** does is doubly irrelevant, and I **again** urge you to actually look up the actual documentation. — DevSolar, Jan 07 '22 at 16:45
You can also rest assured that I read the documentation carefully and couldn't find the answer to my question and this is why I came here. Intel's/AMD documentation doesn't tell you advice on writing a kernel. It tells how the processor works from software perspective. I know about what you say. — user123, Jan 07 '22 at 16:45
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/240835/discussion-between-devsolar-and-user123). — DevSolar, Jan 07 '22 at 16:46
What Linux does is not irrelevant because it is basically a very good example of a good mainstream open source kernel that works very well. — user123, Jan 07 '22 at 16:46
FWIW, close vote retracted. In its current iteration (which is far removed from the original or this comment thread) Q is answerable. — DevSolar, Jan 07 '22 at 21:05

score 1 · Accepted Answer · answered Jan 07 '22 at 18:41

1

I think the confusion is between the gcc memory model and the 64-bit CPU's MMU. Using the kernel memory model generates code that uses signed 32-bit offsets, which means all symbols in the kernel must fit in the top 2GB of the address space. This does not change the fact that virtual address pointers in the kernel are 64-bit, of which 48 or so bits are significant, allowing anything in the kernel or current user space to be indirectly accessed via the page tables and MMU.

answered Jan 07 '22 at 18:41

stark

12,615
3
33
50

Ok, thank you for an answer. Basically, the kernel lies in the upper 2GB including the data segment. What if it needs to access the lower portion of the upper half of the virtual address space (around 0xffff_8000_0000_0000)? It cannot use a 32 bits absolute address since this portion is innacessible. – user123 Jan 07 '22 at 19:13
Not sure what you mean. All addresses are 64-bit. References to symbols in the code can use 32-bit relative addresses because the kernel is smaller than 2G. References to objects outside the code use 64-bit pointers. – stark Jan 07 '22 at 19:46
Ok, thanks again. I think I'm good to go. Sorry for all the confusion. I really get mixed up sometimes. It can be confusing when you begin to write a kernel. At first, my kernel was in the low address space and everything was identity mapped so I didn't need to bother much about paging and this memory model stuff. – user123 Jan 07 '22 at 19:55
Think of it this way. The memory model can generate smaller code by limiting the size of the compiled image, but doesn't affect how the hardware maps virtual to physical addresses. – stark Jan 07 '22 at 20:03
Yes my confusion came from a past question I asked about a linking error when trying to link an assembly file with my kernel. Basically, I had to give up on RIP-relative addressing because of this issue (that I have yet to fully understand because I didn't actually get a full answer there). I did have a discussion with Peter Cordes who proposed to use a kernel code model. This fixed the issue but I had to reload the kernel to 0xffffffff80000000 because of the absolute addressing involved with this code model. I got quite confused afterwards. I think I'm good to go from there. – user123 Jan 07 '22 at 20:15
The past question is here: https://stackoverflow.com/questions/68860310/why-do-i-get-error-ld-failed-to-convert-gotpcrel-relocation-relink-with-no-r. If you want to have a look maybe you have an answer there also. – user123 Jan 07 '22 at 20:17

On x64, how does the Linux kernel access the data segment? Does it use -mcmodel=large during compilation?

1 Answers1