3

If I do

ulimit -v 200000

and the run

sort largefile

I can see from top that sort uses at most 142232 Virt and 92764 Res but this decreases even more after a while.

  1. How does sort know what the ulimit limit was set to?
  2. Why doesn't it use all the 200MB I have given it?
Simd
  • 19,447
  • 42
  • 136
  • 271

1 Answers1

8

If you're using GNU sort, the answer is it calculates a default based on the rlimits for data (set by ulimit -d) and RSS (set by ulimit -m) as well as the sysconf values for available memory and total memory.

Regardless of your ulimit, the default memory size won't exceed more than 3/4ths of either your currently available memory, or 1/8th of your total memory, whichever is greater.

/* Let MEM be available memory or 1/8 of total memory, whichever
   is greater.  */
double avail = physmem_available ();
double total = physmem_total ();
double mem = MAX (avail, total / 8);

/* Leave a 1/4 margin for physical memory.  */
if (total * 0.75 < size)
 size = total * 0.75;

With GNU sort, you can use the -S option to specify sorting buffer size:

   -S, --buffer-size=SIZE
          use SIZE for main memory buffer

This value can either be a number of kilobytes, can be suffixed with another unit (e.g. -S 100M), or can be a percentage of total memory (e.g. -S 55%)

Stuart Caie
  • 2,803
  • 14
  • 15
  • This is a great answer. Thank you. ulimit -v 200000 limits virtual memory I think but sort -S 200000 limits something different. Is that right? – Simd May 16 '14 at 14:24
  • `sort` with no `-S` option chooses a default size, which is to some extent influenced by `ulimit` (RLIMIT_DATA is set by `ulimit -d`, RLIMIT_RSS is set by `ulimit -m`), but also influenced by other variables that you can't control. `sort -S` lets you cut out the ambiguity and say exactly how much you want to use. – Stuart Caie May 16 '14 at 15:02
  • Yes sorry I meant something else. I can see that sort -S 200000 limits Res to 200MB but not Virt (looking at top). But ulimit -v 200000 limits Virt, right? – Simd May 16 '14 at 15:07
  • 1
    This moves on to a slightly different topic. The virtual memory limit ultimately controls how much the data and RSS limits could be, because virtual memory size is the "ultimate limit". Virtual memory is usually left unlimited because it's used for everything, not just data - shared libraries, the executable, as well as the things users care about like private data and what portion is in memory (RSS). See [A way to determine a process's “real” memory usage?](http://stackoverflow.com/questions/118307/a-way-to-determine-a-processs-real-memory-usage-i-e-private-dirty-rss) for more discussion! – Stuart Caie May 16 '14 at 15:18
  • As an example, try running this command with and without a `ulimit -v` limit set: `echo -n "VM limit = "; ulimit -v; yes | head -100000000 | sort -S 2000M | wc`. This will tell `sort` it's allowed to use 2000 MB for its sort buffer. If there's no VM limit, it will be able to use that buffer, and it'll work. If there is a VM limit, even though it has been instructed to use 2000 MB, it won't be allowed, and the process will crash, e.g. `sort: memory exhausted` – Stuart Caie May 16 '14 at 15:22