9

My question is similar to this, however I'm asking something a bit different.

It is clear, that it is possible to use the address of the first std::vector element as a C type array. That means that in virtual memory, std::vector elements are contiguous. However, if physical memory is fragmented, it is possible that std::vector is actually split into many parts in the physical memory.

My question is: Are std::vector elements contiguous in physical memory (as well as virtual memory)?

Community
  • 1
  • 1
ST3
  • 8,826
  • 3
  • 68
  • 92
  • @hsouza I added link to this in my question. – ST3 Aug 30 '13 at 20:27
  • 2
    Under the hood malloc uses mmap, so the question would be if mmap can give you contiguous physical memory. Have you looked at this: http://stackoverflow.com/questions/4401912/linux-contiguous-physical-memory-from-userspace – LarryPel Aug 30 '13 at 20:30
  • 11
    @user2623967 Virtual vs physical memory is not something C++ cares about. If you somehow escape the world of user space on a typical operating system, dive into the kernel and somehow pokes around in virtual memory mappings, you're outside of what C++ cares about. Pretty much any memory of a user space process can be "fragmented" on page boundaries, but that's nothing a user space process needs to care about, whether it's a C++ std::vector or a plain array in C – nos Aug 30 '13 at 20:30
  • 3
    I think the question is about process address space. Generally, OS handles this stuff. Virtual address space is continuous. Data stored in `std::vector` will span across continuous range of virtual address. Where this data is _physically_ placed - it's up to OS. – lapk Aug 30 '13 at 20:31
  • The answer I was looking for was in the link LarryPel gave in a comment. – ST3 Aug 30 '13 at 20:35
  • 1
    If you look at the memory on the virtual space it will be contiguous, but he is asking if the memory is mapped contiguously to the physical memory... Since the mapping is done by page, and if the vector memory is larger than a page, I think is not necessarily contiguous on physical memory. – LarryPel Aug 30 '13 at 20:35
  • If you are trying to use physically contiguous memory in user space (usually for some kind of hardware support) then YOU ARE DOING IT WRONG. You need to write an operating system level driver. – Zan Lynx Aug 30 '13 at 21:17
  • `std::vector<>` has a virtual table ? Ah.. nm. wrong vernacular on the OPs part. – WhozCraig Aug 30 '13 at 21:18
  • I so want to edit this question, but I am afraid the changes would be radical. consistent ⇒ contiguous, virtual table ⇒ virtual memory, memory ⇒ physical memory, etc. Oh, the heck with it... – jxh Aug 30 '13 at 21:42
  • Okay, I did the deed. If you think I totally botched it, feel free to rollback. – jxh Aug 30 '13 at 21:53
  • 2
    @JonathanLeffler: I wish I could +1 edit reasons. – Lightness Races in Orbit Aug 30 '13 at 21:57

3 Answers3

19

The memory used to store the data in a vector must be at contiguous addresses as those addresses are visible to the code.

In a typical case on most modern CPUs/OSes, that will mean the virtual addresses must be contiguous. If those virtual addresses cross a page boundary, then there's a good chance that the physical addresses will no longer be contiguous.

I should add that this is only rarely a major concern. Modern systems have at least some support for such fragmented memory usage right down to the hardware level in many cases. For example, many network and disk controllers include "scatter/gather" capability, where the OS uses the page tables to translate the virtual addresses for the buffer to physical addresses, then supplies a number of physical addresses directly to the controller, which then gathers the data from those addresses if it's transferring from memory to peripheral or "scatters" the data out to those addresses if it's transferring from peripheral to memory.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
8

No, there is no guarantee that you will be provided contiguous physical memory in C++'s abstract machine. Abstractions and hardware below malloc are free to use discontiguous memory.

Only your targeted implementation could make such a guarantee, but the language/model does not care. It relies on the system to do its job.

justin
  • 104,054
  • 14
  • 179
  • 226
1

Virtual to physical memory mapping is handled largely by the CPU, but with kernel support. A userland process cannot know what this mapping is: your program, no matter what the programming language, deals solely in virtual memory addresses. You cannot expect, nor is there any way of even finding out, if two adjacent virtual memory addresses that straddle a page boundary are adjacent in physical memory, so there is absolutely no point worrying about it.

Emmet
  • 6,192
  • 26
  • 39
  • 3
    "nor is there any way of even finding out" - well, this in general is not true, typically every OS provide some way to find out (albeit it may be obscure or require custom drivers) http://stackoverflow.com/questions/6252063/simplest-way-to-get-physical-address-from-the-logical-one-in-linux-kernel-module http://stackoverflow.com/questions/366602/how-to-translate-a-virtual-memory-address-to-a-physical-address; on the other hand, I agree that there's no *standard* way to do such a thing. – Matteo Italia Aug 30 '13 at 22:46
  • OK, @Matteo, can you tell me how a userland program can determine the physical address that corresponds to an arbitrary pointer in C under Linux, and what use that information is? – Emmet Aug 30 '13 at 22:50
  • 1
    For example, interrogating a custom driver that performs the steps described in the first link above and returns the result to it; for the use, it may be the frontend for a kernel debugger, or a hack tool or whatever. My point is that saying that "there's no way to find out" is not correct, simply there's no *standard* or *simple* way. I know that it's nitpicking (and in fact, the +1 you have here is mine), but I don't like absolute statements which are incorrect. – Matteo Italia Aug 30 '13 at 23:05
  • But, @Matteo, you're suggesting writing a custom kernel driver to make the functionality available to userland, so you're not doing it in a “userland process” (which is how I qualified my statement) in any meaningful sense: you're actually doing it in a kernel module. – Emmet Aug 30 '13 at 23:13
  • 1
    With that logic, a userland process then cannot even write on a file, or display text on screen or do any meaningful work besides writing data inside its address space and wasting CPU cycles. – Matteo Italia Aug 30 '13 at 23:28
  • Still, I got what you mean, and in that sense you *are* correct. – Matteo Italia Aug 30 '13 at 23:39
  • @Matteo, there are userland APIs for all of those things that don't necessitate writing custom kernel modules. If our standard for things being available in userland is that it is technically possible to write a kernel module that exposes them, then absolutely *anything* in kernel space is accessible in userland, and the distinction between what can, and cannot, be done in userland is semantically vacuous. You've just defined away the difference between userland and kernel space, which I think is a useful distinction, so I hope you'll understand if I demur. – Emmet Aug 30 '13 at 23:53
  • 2
    I agree, I went too deep in nitpicking :) ; just to clean this mess up, my original point was: it's false that there's no way to know the virtual=>physical mapping, since the OS knows and the process can ask the OS (and in line of principle there's no reason why it shouldn't provide this information - it's just mostly useless); however it is *true* that a process in ring 3 cannot know by its own means this mapping, so, if there's no API or driver available to provide you this information, yes, it has no way to know. I think we can both agree with this, the rest are "semantic" problems. – Matteo Italia Aug 31 '13 at 00:15
  • 1
    @MatteoItalia, yes, I understand and agree that is it, in principle, possible to expose this information to userland, but what conceivable purpose would it serve? Most of the time, the *kernel* doesn't even care about the exact values in the TLBs and just walks the page tables when the CPU asks it to. I guess the reason that this information isn't exposed to userland is that you couldn't really do anything useful with it if you had it. To be honest, I'd actually quite like to have that facility as a pure curiosity, useless or not. – Emmet Aug 31 '13 at 00:30
  • 1
    Well, for that probably you'd have to ask to OP :) – Matteo Italia Aug 31 '13 at 00:37
  • It'd be an interesting little project! – Emmet Aug 31 '13 at 00:40
  • I'm a bit late to the party but I guess we can infer wether two adjacent memory adresses are on the same page or not through timing. For example we can first make sure that a these adresses are not in RAM (by reading a huge amount of data that is at least one page away from these adresses, to make sure these pages are evicted), and then time 2 reads to these locations, and see if the time looks more-or-less like the time it takes for a single hard page fault or for two hard page faults. – Olivier Sohn Jan 22 '20 at 14:04