0

If I need a program to r/w data larger then 1T randomly, a simplest way is putting all data into memory. For the PC with 2G memory, we can nevertheless work by doing a lot of i/o. Different computers have different size memory, so how to alloc suitable memory on computers from 2G to 2T memory in one program?

I thought of popen /proc/meminfo and alloc MemFree but I think it maybe have a better way.


note:

  1. use Linux, but other OS is welcome
  2. avoid being OOM killed as well as possible (without root)
  3. disk i/o as less as possible
  4. use multiprocessing
  5. c or c++ answer is fine
fusztal
  • 63
  • 6
  • 2
    [Why is malloc not "using up" the memory on my computer?](https://stackoverflow.com/q/19991623/2410359) also may apply here. – chux - Reinstate Monica Aug 01 '22 at 03:42
  • 1
    Using malloc in C++? Are you sure? Anyway most bigger systems have something called virtual memory so you could usually allocate more memory then physical size. However to get good performance it usually is better to work with chunks of data that not only fit in memory but also fit in the cache of the CPU. So the correct answer really depends on a better defined question? What is it you want to do, and how important is optimization and what needs optimizing – Pepijn Kramer Aug 01 '22 at 04:28
  • 1
    It's not clear what you mean by "putting all data into memory". If you have 1TB of data, you clearly can't put it all into the RAM of a PC with 2GB of RAM. Do you mean you want to `mmap()` a 1TB file and then rely on the PC's virtual memory subsystem to page parts of that file in and out of RAM as required? – Jeremy Friesner Aug 01 '22 at 05:13
  • @JeremyFriesner I mean, if we use the machine with 2T memory, all data can putting into memory, using `mmap` or others. If use the PC with 2G memory, we should define a `len` to call `mmap(addr, len, ...)`. So my question is what is the best `len`. – fusztal Aug 01 '22 at 05:51
  • @PepijnKramer Well, it doesn't have to be `malloc()`, I just want to know how much memory I should use. And yes you're right, I should consider the cache of the CPU. But leave it alone, this question only talking about memory size. – fusztal Aug 01 '22 at 05:56
  • (From wikipedia) The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations. This allows up to 256 TiB (2^48 bytes) of virtual address space. But anything over the physical memory size will involve disk swapping (and that's a huge performance hit). So maybe https://stackoverflow.com/questions/2513505/how-to-get-available-memory-c-g – Pepijn Kramer Aug 01 '22 at 08:49
  • @fusztal I can't think of any reason (when using `mmap()` on a 64-bit system, at least) why `len` shouldn't always be the full 1TB size of the file; the virtual memory system should only page in the pages of the file that you actually access, and if physical RAM gets full, then the virtual memory system should handle that by flushing pages back to disk as necessary. – Jeremy Friesner Aug 01 '22 at 16:24
  • 1
    This is a classic [XY-problem](https://en.wikipedia.org/wiki/XY_problem) -- the answer is that you should use as much memory as you need to use, and no more. You do not and should not care how much memory the computer has. If an allocation/request fails, you need to deal with that, but that's always the case. If you have a big file you need random access to, just use mmap and don't worry about reading/writing OR allocating memory. – Chris Dodd Aug 02 '22 at 02:01
  • @JeremyFriesner I learned the mechanism of `mmap` and realized it doesn't trigger OOM killer. My problem is solved. Thank you. – fusztal Aug 02 '22 at 05:56
  • @JeremyFriesner And, I'm a little confused about "By default, any process can be killed at any moment when the system runs out of memory.", which from `man mmap`. – fusztal Aug 02 '22 at 06:05
  • @PepijnKramer Thanks for a helpful link. It seems the best way is using `mmap` and map all data I needed. – fusztal Aug 02 '22 at 06:07

1 Answers1

0

You can use the GNU extension get_avphys_pages() from glibc

The get_phys_pages function returns the number of available pages of physical the system has. To get the amount of memory this number has to be multiplied by the page size.

Sample code:

#include <unistd.h>
#include <sys/sysinfo.h>
#include <stdio.h>

int main() {
    long int pagesize = getpagesize();
    long int avail_pages = get_avphys_pages();
    long int avail_bytes = avail_pages * pagesize;
    printf( "Page size:%ld Pages:%ld Bytes:%ld\n", 
        pagesize, avail_pages, avail_bytes );
    return 0;
}

Result Godbolt

Program returned: 0
Page size:4096 Pages:39321 Bytes:161058816

This is the amount of PHYSICAL memory in your box so:

  1. The true available memory can be much higher if the process pages in/out

  2. The physical memory is a maximum as there would be other processes using memory too.

So treat that result as an estimate upper bound for available DDR.

If you plan to allocate large chunks of memory use mmap() directly as malloc() would be too high level for this usage.

Something Something
  • 3,999
  • 1
  • 6
  • 21