134

In a book I read the following:

32-bit processors have 2^32 possible addresses, while current 64-bit processors have a 48-bit address space

My expectation was that if it's a 64-bit processor, the address space should also be 2^64.

So I was wondering what is the reason for this limitation?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
er4z0r
  • 4,711
  • 8
  • 42
  • 62
  • 18
    The book must have been talking specifically about the current implementation of the AMD64 architecture (x86-64). Only the low-order 48 bits are used. This is not a hardware limitation, though--all 64 bits are available. – Cody Gray - on strike Jul 16 '11 at 11:12
  • 13
    Always a good idea to identify the book. – H H Jul 16 '11 at 11:14
  • 1
    I'm guessing that physical address lines aren't free (you need 16 extra cpu pins at least). And i'm not aware of any hardware that can fill a 48 bit space with physical RAM chips on the same processor yet. When this becomes feasible, i'm sure AMD will add the missing 16 pins :) – Torp Jul 16 '11 at 11:17
  • @Cody: I know what you mean, but isn't it *exactly* a hardware limitation, if it is specific to the current implementation of AMD64? ;) – jalf Jul 16 '11 at 11:17
  • 8
    even, `The 32-bit processors have 2^32 possible addresses` is not necessarily true, there can exist 32bit cpu with only 24 "pins" for addressing memory. E.g. 68EC020 (cheaper 68020 version) is a 32bit cpu but with 24 bits for addressing memory. – ShinTakezou Jul 16 '11 at 11:53
  • 28
    There's a very real problem with 64-bit physical addressing, the virtual memory page size is too small. Which makes for enormous page directories and extremely expensive TLB cache flushes on every context switch. Moving from 4KB to 4MB pages is an option but very incompatible with current operating systems. – Hans Passant Jul 16 '11 at 13:32
  • 1
    Furthermore, `The 32-bit processors have 2^32 possible addresses` is rather vague; for example, a number of 32-bit x86 CPUs (typically server/workstation) support PAE, which allows for a 36-bit physical address space. A number of modern x86_64 CPUs support a 48-bit physical address space and a 52-bit virtual address space. – user314104 May 17 '14 at 07:28
  • 1
    @HansPassant Can you expand on that? I'm not quite sure what you mean. How does the size of an OS's individual page size relate to the address space? How does changing the page size help with increased physical address space? – Aaron Franke Nov 27 '16 at 01:19
  • 1
    @AaronFranke: for a fixed page size and number of TLB entries, you can only cover a fixed amount of virtual working set size. With more RAM, that's an ever smaller fraction of available memory, and as memory bandwidth improves you're chewing through that set faster and getting more TLB misses. But I think Hans was primarily talking about the amount of space needed just for page tables to tell the CPU where the phys page is for each virtual page. With deeper nested page tables page walker are more expensive, and larger TLBs cost more if you flush them all. – Peter Cordes Apr 12 '20 at 03:10
  • Plain 32b could do direct 32:32 translation. 32b PAE had 36b address space but needed 40b in the tables, padded to 64b for alignment. The amd64 architecture standard just added one more level of tables to the old PAE system. Bringing the address space to 48b while being simple to quickly implement at both hardware and software levels while still fitting inside the allotted 64b. To extend the page space to 64b would need either substantial engineering and backward compatibility problems or using more than 64b in page tables. 52b phys was AMD's choice based on the 48b space plus a small buffer. – Max Power Dec 17 '22 at 23:08
  • The book this is quoted from is "Hacking: The Art of Exploitation (2nd Edition)" by Jon Erickson. Although it seems to have been changed after the first printing/publication, because my printed copy (fifteenth printing) matches the quote in the question, but a PDF copy of the book that I have says that 64-bit processers also have 2^64 possible addresses, instead of 48. – Erik Swan Apr 07 '23 at 17:16

10 Answers10

176

Because that's all that's needed. 48 bits give you an address space of 256 terabyte. That's a lot. You're not going to see a system which needs more than that any time soon.

So CPU manufacturers took a shortcut. They use an instruction set which allows a full 64-bit address space, but current CPUs just only use the lower 48 bits. The alternative was wasting transistors on handling a bigger address space which wasn't going to be needed for many years.

So once we get near the 48-bit limit, it's just a matter of releasing CPUs that handle the full address space, but it won't require any changes to the instruction set, and it won't break compatibility.

jalf
  • 243,077
  • 51
  • 345
  • 550
  • 147
    640kb is enough for anyone. –  Jul 16 '11 at 11:35
  • 9
    Are you still running an 8088 system, bdares? – Joe Jul 16 '11 at 11:39
  • 31
    @bdares: Bad analogy. The 8088/8086 arch's instruction set has a 640k limit built into it. Only making a new ISA (386) was it possible to break the barrier. x86_64 on the other hand supports all 64 bits in the ISA. It's just the current-generation hardware that can't make use of them all... – R.. GitHub STOP HELPING ICE Jul 16 '11 at 12:29
  • 18
    @R. Actually, the limitation in the CPU was one megabyte. The IBM PC designated a section of that for memory mapped peripherals, BIOS, etc. Some other 8088/8086 designs (Zenith Z100, if memory serves) designated less for peripherals and such, and correspondingly more for application programs. – Jerry Coffin Jul 16 '11 at 19:30
  • 1
    Anyone know enough about CPU architect-ing to provide a more detailed answer? Specifically providing more details: _So CPU manufacturers took a shortcut. They use an instruction set which allows a full 64-bit address space, but current CPUs just only use the lower 48 bits. The alternative was wasting transistors on handling a bigger address space which wasn't going to be needed for many years._ (For example, 10% more transistors to cpu or such and such component). I can think of some really interesting memory management schemes and side effects if allowed to have a huge address space. – Bryan Buckley Oct 08 '14 at 16:22
  • 3
    @BryanBuckley What, specifically, is your question? There's a cost on the CPU side to handle larger address spaces (in particular, larger page tables). CPU designers try to balance that with the benefit you get from the larger address space. And 48 bits is a pretty huge address space, isn't it? – jalf Oct 08 '14 at 16:34
  • @jaif I want to know _what is the cost_ on the CPU side. A simple, specific example I would understand is what would an AArch64 iteration (ARMv9) look like where simply ~64b (or even 80, 96, etc) address space is supported (and what is the effecting % transistor count increase of CPU like A57, in this example). 48b certainly handles addressing physical memory and traditional, real-world address space usage, but is not much address space to support some novel ideas (memory management schemes and their side effects) in SW. – Bryan Buckley Oct 09 '14 at 08:57
  • 30
    http://lwn.net/SubscriberLink/655437/9a48cd3e7a8cbe8a/ <-- three years after this reply, we are already hitting these limits :) The HP Machine will have 320TB of memory and they can't provide it as a flat address space because of the 48-bit addressing limitation. – agam Aug 28 '15 at 19:27
  • 1
    @agam Ooh, that's interesting. Luckily, there's nothing stopping CPU manufacturers from enabling use of longer addresses. Perhaps it won't be long before they start using some of the remaining bits then. :) – jalf Aug 28 '15 at 20:13
  • BryanBuckley there's no simple answer. It depends. No doubt they could implement it with relatively few extra transistors, but then it might be slower than if they allowed themselves to use *a lot* more transistors. It's a trade-off. And until very recently, it was a trade-off where CPU manufacturers saw absolutely no gain, no *reason* to spend a single transistor on it. As agam showed in the comment above that might be about to change – jalf Aug 28 '15 at 20:15
  • 3
    http://os.phil-opp.com/entering-longmode.html#paging explains how x86_64 does **paging**, the mapping of virtual memory addresses (used by CPU instructions) and physical addresses (hardware). One memory "page" is 4 kilobytes long. The low 12 bits of a pointer point within a page. Pages are found in a four-level trie, each level with a 512-entries (9-bit) table. 12 + 4 * 9 gives a total of 48 bits (256 terabytes) of mappable virtual memory. *Physical* pages addresses can use up to 52 bits (4 petabytes) since the upper 12 bits are reserved for things like marking a page "non-executable". – Simon Sapin May 30 '16 at 14:39
  • 4
    Currently, the x86-64 architecture uses a four level paging hierarchy, where each level of the tree handles 9 bits of the address space. The nine comes from 512 64-bit entries fitting on each 4KB page. This limits the linear address space to 48 bits (256TB). Another level in the hierarchy would extend that to 57 bits (128PB). The format of the page table has the XD (execute disable) and PKEY (protection key) in the highest bits, but the physical address range can be extended to 59 bits with no changes to paging structures (512PB). – doug65536 Nov 12 '16 at 10:45
  • Well, it's 128 TB for a signed value, and judging by the fact that many 32-bit operating systems had some limitations at 2 GB (or at 3.5 GB, or other values below 4 GB), it'd be more safe to say that you wouldn't want to go over 128 TB with 48 bits of address space. – Aaron Franke Nov 27 '16 at 01:14
  • 1
    @AaronFranke I'm not sure about other OSes, but I believe 32-bit Windows limited programs to 2 GB both to make it less likely that they'd be able to inadvertently corrupt kernel memory if something went seriously wrong, and to make it easier to implement the memory manager. (From what I gather, the OS reserves the upper 2 GB of a virtual address space for itself, while making the lower 2 GB available for user processes; all OS processes share the kernel space, while each process gets its own user space. It doesn't directly correlate to physical memory.) – Justin Time - Reinstate Monica Jul 08 '17 at 03:03
  • ...The issue with this is that using the lower 2 GB for user processes means that all addresses available to them will fit a signed `int`; regardless of whether Windows itself actually uses signed values or not, there's no reason that 32-bit programs that don't explicitly recognise more than the regular 2 GB space can't. This can (and has) come back to bite people in the rear while trying to update programs from 32-bit to 64-bit addresses, so people will have _hopefully_ learned from experience and it won't be an issue in the future. If so, then programs should be fine with more than 128 TB. – Justin Time - Reinstate Monica Jul 08 '17 at 03:18
  • This tactic of the CPU manufactures is not you: the Motorala 68000, released in 1978, had a 32 bit instruction set with a 24 bit address bus. – Manu Jul 13 '19 at 19:01
22

Any answer referring to the bus size and physical memory is slightly mistaken, since OP's question was about virtual address space not physical address space. For example the supposedly analogous limit on some 386's was a limit on the physical memory they could use, not the virtual address space, which was always a full 32 bits. In principle you could use a full 64 bits of virtual address space even with only a few MB of physical memory; of course you could do so by swapping, or for specialized tasks where you want to map the same page at most addresses (e.g. certain sparse-data operations).

I think the real answer is that AMD was just being cheap and hoped nobody would care for now, but I don't have references to cite.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 18
    "Being cheap" I guess you mean not adding pins that will never be used, not taking up chip space for transistors that won't be used and using the freed space to make existing instructions faster? If that's being cheap, I'm in! – Olof Forshell Jul 17 '11 at 05:16
  • The 80386 allows 2 * 4096 selectors each containing up to 4GB of memory (32TB total). The 80286 allowed 2 * 4096 selectors each containing up to 64KB (1GB). – Olof Forshell Jul 17 '11 at 05:24
  • 1
    Non-linear segmented hacks do not count as address space in my book. There's no way for portable software to make any use of them. – R.. GitHub STOP HELPING ICE Jul 17 '11 at 06:00
  • @R.. - I thought the definition of portable software is that it *can*. :-) For example, C++ forbids comparing pointers into different arrays so that they can be in separate 4GB segments. – Bo Persson Jul 17 '11 at 07:48
  • 1
    If your compile actually generates huge pointers and loads a segment register for each memory dereference then yes. But in reality that's horribly slow, and instead everyone used small memory models and `__far` (or worse yet, `FAR`/`far`!) pointers... – R.. GitHub STOP HELPING ICE Jul 17 '11 at 13:04
  • @R.. the 2*4096*4GB is the original PVAM (Protected Virtual Address Mode) converted from 16 to 32 bit form. What your book says is, I guess, of any interest only to you. Also, the original 8086 memory scheme contained several variants such as small, compact, medium and large (expanded on by MS) so that you would NOT need to have huge pointers all over the place. I personally used the 16-bit medium model (>64KB code <=64KB data&stack) with appropriate 32-bit overrides to allow 40-50MB data areas. Bo Persson puts it aptly "that it can" - it MOST CERTAINLY "can." – Olof Forshell Jul 17 '11 at 17:19
  • @R.. nothing in the OP question made me think he's talking about virtual address space — About the other comments, couldn't one make special hardware (exploiting CPU features) so that its physical "address pins" address two (or N!) different "banks" according to some selector "mechanisms" (e.g. `out NUM, reg`), where each bank is the max allowed by the "pins"? (e.g. 2^48) Then, could we say the cpu can address N*2^48 physical RAM? I would still say that the max address space of that CPU allows for "only" 2^48 bytes of RAM. – ShinTakezou Jul 17 '11 at 19:06
  • @R.. One can't know for sure if virtual or physical address space was meant, but indication is strong that it's physical. https://en.wikipedia.org/wiki/X86-64#Physical_address_space_details – Thorsten Schöning Aug 06 '15 at 12:12
  • @R.. Nothing in CPU design is inherently "horribly slow", the pipeline could make segment loading and segment overrides as fast as necessary. Look at the push/pop instructions, they all create horrible dependency chains against modification of the stack pointer register. CPUs now have hardware which handles that and makes it a completely free operation and a total non-issue. – doug65536 Nov 12 '16 at 10:56
11

There is a more severe reason than just saving transistors in the CPU address path: if you increase the size of the address space you need to increase the page size, increase the size of the page tables, or have a deeper page table structure (that is more levels of translation tables). All of these things increase the cost of a TLB miss, which hurts performance.

Brendan
  • 1,995
  • 1
  • 20
  • 35
  • 3
    [Intel is proposing a 5-level paging scheme](https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf) to extend from the current 48 bits to 57 bits. (Same 9 bits per level / 4k pages as current x86-64 page tables). Using 10 or 11 bits per level would have required changing the page-walk hardware, so this might not be the optimal design for huge memory, but it's a sensible extension for a dual-mode CPU that needs to also support maximum performance for 4-level tables in the current format. – Peter Cordes Dec 04 '17 at 02:22
  • 1
    Of course, with 2M or 1G hugepages, it's only 4 or 3 levels of page tables from top level to a huge-page table entry instead of a page directory pointer. – Peter Cordes Dec 04 '17 at 02:28
11

Read the limitations section of the wikipedia article:

A PC cannot contain 4 petabytes of memory (due to the size of current memory chips if nothing else) but AMD envisioned large servers, shared memory clusters, and other uses of physical address space that might approach this in the foreseeable future, and the 52 bit physical address provides ample room for expansion while not incurring the cost of implementing 64-bit physical addresses

That is, there's no point implementing full 64 bit addressing at this point, because we can't build a system that could utilize such an address space in full - so we pick something that's practical for today's (and tomorrow's) systems.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • Where does the 4 come from in the 4 petabytes? If we're talking 64 address lines we should end up with the square of the address space made possible by 32 address lines which is 4 gigabytes. Square that and we should have 16, not 4 petabytes. Am I missing something? – Olof Forshell Jul 18 '11 at 11:06
  • 1
    It comes from the current physical limit (52 bits) - the point being that we can't put enough RAM in a PC to support this restricted range, let alone what would be required for a full 64-bit address space. – Damien_The_Unbeliever Jul 18 '11 at 11:10
10

The internal native register/operation width does not need to be reflected in the external address bus width.

Say you have a 64 bit processor which only needs to access 1 megabyte of RAM. A 20 bit address bus is all that is required. Why bother with the cost and hardware complexity of all the extra pins that you won't use?

The Motorola 68000 was like this; 32 bit internally, but with a 23 bit address bus (and a 16 bit data bus). The CPU could access 16 megabytes of RAM, and to load the native data type (32 bits) took two memory accesses (each bearing 16 bits of data).

  • 1
    but 68000 is considered as a "16/32 bit" cpu, not "full" 32 bit cpu so one could say it has still a foot in the 16bit past; I've picked the 68020 as an example, since its low-cost 68EC020 version has 24 bit only for addresses, though the 68020 is a "full" 32 bit cpu... +1 to have considered this wonderful processor family! – ShinTakezou Jul 16 '11 at 11:59
  • @ShinTakezou: honestly, was the 80386SX a 16-bit CPU (because it had an address space like the 80286) or was it 32-bit (because it had the internal architecture of an 80386DX)? One could say as you do but another (this one) says "internal is what counts" - and you can quote me on that. – Olof Forshell Jul 17 '11 at 17:24
  • @Olof I think that, in the context of the "memory" (which is the external world), external is what counts, so 68000 is a 16bit CPU (needing 2 "steps" to read 32 bit data) :D – ShinTakezou Jul 17 '11 at 19:24
  • @ShinTakezou: the memory context, even caches, is always external to the cpu itself even though they are extremely tightly coupled in modern processors. The 8088 was internally equal to the 8086 though it had eight data bus lines to the 8086's sixteen. I don't see what you apparently see as obvious, that the 8088 should be classified in the same group as the Z80, 8080, 8085 etc. The question of the width of the data bus seems trivial in that context – Olof Forshell Jul 18 '11 at 10:58
  • I am not an expert of such a matter at all,so I have nothing obvious to me.I wanted just to notice the need for a sharper cut with the past, where one could think 68000 is still an "old time" processor, so that it could seem "natural" that its address space is limited to less than 32 bit;while the 68020 can 32 bit, so that the existence of the 68EC020 with its limit makes clear that it's a choice not due to "limit of that (or this) time" but to other consideration (like to make it cheaper if there's no real advantage in having 64 pins), which is more or less the argument of this answer. – ShinTakezou Jul 18 '11 at 11:13
  • I take the view the CPU "size" in bits is the native integer type, which is typically the register width. This general defines the maximum and minimum mathmatical extremes for that CPU, which are hard functionality limits. External interfaces are irrelevent - if they are smaller, it merely means multiple accesses to memory to obtain a native integer type, e.g. soft functional limits. –  Jul 19 '11 at 12:03
8

From my point of view, this is result from the page size.Each page at most contains 4096/8 =512 entries of page table. And 2^9 =512. So 9 * 4 + 12=48.

linzuojian
  • 602
  • 6
  • 8
6

Many people have this misconception. But I am promising to you if you read this carefully, after reading this all your misconceptions will be cleart.

To say a processor 32 bit or 64 bit doesn't signify it should have 32 bit address bus or 64 bit address bus respectively!...I repeat it DOESN'T!!

32 bit processor means it has 32 bit ALU (Arithmetic and Logic Unit)...that means it can operate on 32 bit binary operand (or simply saying a binary number having 32 digits) and similarly 64 bit processor can operate on 64 bit binary operand. So weather a processor 32 bit or 64 bit DOESN'T signify the maximum amount of memory can be installed. They just show how large the operand can be...(for analogy you can think of a 10-digit calculator can calculate results upto 10 digits...it cannot give us 11 digits or any other bigger results... although it is in decimal but I am telling this analogy for simplicity)...but what you are saying is address space that is the maximum directly interfaceable size of memory (RAM). The RAM's maximum possible size is determined by the size of the address bus and it is not the size of the data bus or even ALU on which the processor's size is defined (32/64 bit). Yes if a processor has 32 bit "Address bus" then it is able to address 2^32 byte=4GB of RAM (or for 64 bit it will be 2^64)...but saying a processor 32 bit or 64 bit has nothing relevance to this address space (address space=how far it can access to the memory or the maximum size of RAM) and it is only depended on the size of its ALU. Of course data bus and address bus may be of same sized and then it may seem that 32 bit processor means it will access 2^32 byte or 4 GB memory...but it is a coincidence only and it won't be the same for all....for example intel 8086 is a 16 bit processor (as it has 16 bit ALU) so as your saying it should have accessed to 2^16 byte=64 KB of memory but it is not true. It can access upto 1 MB of memory for having 20 bit address bus....You can google if you have any doubts:)

I think I have made my point clear.Now coming to your question...as 64 bit processor doesn't mean that it must have 64 bit address bus so there is nohing wrong of having a 48 bit address bus in a 64 bit processor...they kept the address space smaller to make the design and fabrication cheap....as nobody gonna use such a big memory (2^64 byte)...where 2^48 byte is more than enough nowadays.

hafiz031
  • 2,236
  • 3
  • 26
  • 48
  • I think you made your point very clear, there is one thing I don't understand though in what you said about the 16 bits 8086 CPU : how can a 16 bits CPU handle a 20 bits address ? Does it handle it through a 2 steps operation ? Even if the address bus is 20 bits width, once it gets to the CPU, the register width can obviously only take 16 bits ... How do they do that ? – programmersn Jan 01 '19 at 13:06
  • 2
    Hmm...2 steps operation. Segment register contains only the upper 16 bits. Then it is multiplied by 10H to make it 20 bits and then the offset is added. – hafiz031 Jan 06 '19 at 02:27
3

To answer the original question: There was no need to add more than 48 Bits of PA.

Servers need the maximum amount of memory, so let's try to dig deeper.

1) The largest (commonly used) server configuration is an 8 Socket system. An 8S system is nothing but 8 Server CPU's connected by a high speed coherent interconnect (or simply, a high speed "bus") to form a single node. There are larger clusters out there but they are few and far between, we are talking commonly used configurations here. Note that in the real world usages, 2 Socket system is one of the most commonly used servers, and 8S is typically considered very high end.

2) The main types of memory used by servers are byte addressable regular DRAM memory (eg DDR3/DDR4 memory), Memory Mapped IO - MMIO (such as memory used by an add-in card), as well as Configuration Space used to configure the devices that are present in the system. The first type of memory is the one that are usually the biggest (and hence need the biggest number of address bits). Some high end servers use a large amount of MMIO as well depending on what the actual configuration of the system is.

3) Assume each server CPU can house 16 DDR4 DIMMs in each slot. With a maximum size DDR4 DIMM of 256GB. (Depending on the version of server, this number of possible DIMMs per socket is actually less than 16 DIMMs, but continue reading for the sake of the example).

So each socket can theoretically have 16*256GB=4096GB = 4 TB. For our example 8S system, the DRAM size can be a maximum of 4*8= 32 TB. This means that the max number of bits needed to address this DRAM space is 45 (=log2 32TB/log2 2).

We wont go into the details of the other types of memory (MMIO, MMCFG etc), but the point here is that the most "demanding" type of memory for an 8 Socket system with the largest types of DDR4 DIMMs available today (256 GB DIMMs) use only 45 bits.

For an OS that supports 48 bits (WS16 for example), there are (48-45=) 3 remaining bits. Which means that if we used the lower 45 bits solely for 32TB of DRAM, we still have 2^3 times of addressable memory which can be used for MMIO/MMCFG for a total of 256 TB of addressable space.

So, to summarize: 1) 48 bits of Physical address is plenty of bits to support the largest systems of today that are "fully loaded" with copious amounts of DDR4 and also plenty of other IO devices that demand MMIO space. 256TB to be exact.

Note that this 256TB address space (=48bits of physical address) does NOT include any disk drives like SATA drives because they are NOT part of the address map, they only include the memory that is byte-addressable, and is exposed to the OS.

2) CPU hardware may choose to implement 46, 48 or > 48 bits depending on the generation of the server. But another important factor is how many bits does the OS recognize. Today, WS16 supports 48 bit Physical addresses (=256 TB).

What this means to the user is, even though one has a large, ultra modern server CPU that can support >48 bits of addressing, if you run an OS that only supports 48 bits of PA, then you can only take advantage of 256 TB.

3) All in all, there are two main factors to take advantage of higher number of address bits (= more memory capacity).

a) How many bits does your CPU HW support? (This can be determined by CPUID instruction in Intel CPUs).

b) What OS version are you running and how many bits of PA does it recognize/support.

The min of (a,b) will ultimately determine the amount of addressable space your system can take advantage of.

I have written this response without looking into the other responses in detail. Also, I have not delved in detail into the nuances of MMIO, MMCFG and the entirety of the address map construction. But I do hope this helps.

Thanks, Anand K Enamandram, Server Platform Architect Intel Corporation

  • This question is asking about 48-bit *virtual* address space size (requiring virtual addresses to be canonical). You do want more virtual bits than physical bits, so a high-half kernel can map all of physical memory into a single address space (it's own or user-space). As you say, HW only needs to implement as many PA bits as the DRAM controllers + MMIO can use, and can use any number up to the 52-bit limit in the x86-64 page-table format. ([Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)?](https://stackoverflow.com/q/46509152)) – Peter Cordes Oct 28 '18 at 17:24
  • 1
    The 4-level page-table format also imposes the 48-bit VA limit, until HW + SW support PML5 page tables for 57-bit VAs. Anyway, this is a useful answer, but it seems to be posted under the wrong question. I'm not sure if there's a better place for it, so I guess we can leave it here, hopefully with an edit to add a header to say something about PA vs. VA. – Peter Cordes Oct 28 '18 at 17:29
2

It's not true that only the low-order 48 bits of a 64 bit VA are used, at least with Intel 64. The upper 16 bits are used, sort of, kind of.

Section 3.3.7.1 Canonical Addressing in the Intel® 64 and IA-32 Architectures Software Developer’s Manual says:

a canonical address must have bits 63 through 48 set to zeros or ones (depending on whether bit 47 is a zero or one)

So bits 47 thru 63 form a super-bit, either all 1 or all 0. If an address isn't in canonical form, the implementation should fault.

On AArch64, this is different. According to the ARMv8 Instruction Set Overview, it's a 49-bit VA.

The AArch64 memory translation system supports a 49-bit virtual address (48 bits per translation table). Virtual addresses are sign- extended from 49 bits, and stored within a 64-bit pointer. Optionally, under control of a system register, the most significant 8 bits of a 64-bit pointer may hold a “tag” which will be ignored when used as a load/store address or the target of an indirect branch

Olsonist
  • 2,051
  • 1
  • 20
  • 35
  • 1
    Only the lower 48 are significant, but the hardware checks that it's correctly sign-extended to 64 bits. IDK why they didn't specify zero-extension; maybe they wanted to make it more convenient to check for a high vs. low half address (by just checking the sign bit). Or maybe to avoid making the 2^48 boundary special, and so addresses near the top can conveniently fit in 32-bit sign-extended constants. I think the latter is more likely. – Peter Cordes Aug 09 '17 at 01:31
  • Anyway, current HW checking for canonical prevents software from using ignored bits for tagged pointers that will break on future HW, so it's part of the mechanism that makes it possible to extend future hardware if/when it's needed. (Which could be sooner rather than they expected, thanks to non-volatile memory being hooked up directly into physical and virtual address space.) – Peter Cordes Aug 09 '17 at 01:34
  • procfs on Linux on my Core i5 says that it gets mapped to 7ffd5ea41000-7ffd5ea62000. This address range makes sense according to the above 'canonical' rule. Bit 48-63 are 0 making it a correct canonical address. What's a little strange are some addresses in the Linux source. In include/asm/pgtable_64_types it says #define __VMALLOC_BASE _AC(0xff92000000000000, UL). This is NOT a canonical address. Such an address would start with 0xffff8. Dunno why. – Olsonist Oct 05 '17 at 16:47
  • Yeah, IIRC Linux uses the low half of the canonical range for user-space, and (mostly) uses the high half for kernel-only mappings. But some kernel memory is exported to user-space, like the `[vsyscall]` page. (That may be exporting stuff like current PID so that `getpid()` is purely user-space. Also `gettimeofday()` can just use rdtsc in user-space + scale factors exported by the kernel. Although some of that is I think in `[vdso]`, which is near the top of the bottom half.) – Peter Cordes Oct 05 '17 at 19:57
  • IDK what `__VMALLOC_BASE` does. Presumably it's not used directly. – Peter Cordes Oct 05 '17 at 19:58
  • Right, the low half would be 0 .. 0x7fffffffffff and the high half would be 0xffff800000000000 .. 0xffffffffffffffff. Those are the valid user and kernel canonical addresses. But Linus is using a non-canonical address and I think, as you say, he's not using it directly. Searching through the source, it's only used in the x86 KASLR stuff and I'm not going down that rabbit hole. As always, thanks. – Olsonist Oct 05 '17 at 20:10
0

A CPU is considered "N-bits" mainly upon its data-bus size, and upon big part of it's entities (internal architecture): Registers, Accumulators, Arithmetic-Logic-Unit (ALU), Instruction Set, etc. For example: The good old Motorola 6800 (or Intel 8050) CPU is a 8-bits CPU. It has a 8-bits data-bus, 8-bits internal architecture, & a 16-bits address-bus.


  • Although N-bits CPU may have some other than N-size entities. For example the impovments in the 6809 over the 6800 (both of them are 8-bits CPU with a 8-bits data-bus). Among the significant enhancements introduced in the 6809 were the use of two 8-bit accumulators (A and B, which could be combined into a single 16-bit register, D), two 16-bit index registers (X, Y) and two 16-bit stack pointers.
Amit G.
  • 2,546
  • 2
  • 22
  • 30
  • There's already [an answer](https://stackoverflow.com/questions/6716946/why-do-x86-64-systems-have-only-a-48-bit-virtual-address-space/6716984#6716984) making this point with Motorola 68000 / 68020 as an example. This question is really about x86-64 specifically, not old 8 / 16-bit CPUs. In the case of x86-64, one of the major factors is that wider virtual addresses would need a deeper page table, and that factor didn't exist for the old chips you're talking about. – Peter Cordes Jul 22 '18 at 07:53
  • data-bus width doesn't have to match register or ALU width. For example, P5 Pentium has a 64-bit data bus (aligned 64-bit loads/stores are guaranteed to be atomic), but registers/ALUs are only 32 bit (except for the integrated FPU, and in the later Pentium MMX the SIMD ALUs.) – Peter Cordes Jul 22 '18 at 07:55
  • OP write: "My expectation was that if it's a 64-bit processor, the address space should also be 2^64." ........ You write: "This question is really about x86-64 specifically, not old 8 / 16-bit CPUs". ........ I think you missed the essence of OP question. OP question is an outcome of the wrong assumption that a 64-bits CPU should have a 64-bits address-bus. About the ALU, I wrote **big part** of its entities; Not all of them. – Amit G. Jul 22 '18 at 08:55
  • Stop spamming me by reposting this comment. Yes of course the OP is wrong for the reason you describe, but I was pointing out that your answer looks like it makes a similar mistake. You say "*and consequently big part of it's entities: Registers and Accumulators, Arithmetic-Logic-Unit (ALU) ...*", which sounds like you're saying that those things match the data bus width. The phrase "a big part" implies that you're saying *which* parts, not that it's only sometimes true for those parts. – Peter Cordes Jul 22 '18 at 08:57