0

I'm having memory issues in a program which I cannot isolate. I'm wondering which would be the best strategy to debug it.

My program exhausts available memory when running a line similar to this one:

Sys.command "solver file.in > file.out".

The error message is:

Fatal error: exception Sys_error("solver file.in > file.out: Cannot allocate memory")

Before the error, the program runs for about 15 seconds, consuming over 1 GB of RAM, until it finally dies.

However, running the exact same command line in the shell (with the same input file) only requires 0.7 seconds and uses less than 10 MB of RAM.

It seems something is leaking an absurd amount of memory, but I cannot identify it. Trying to isolate the error by copying it in a new OCaml file results in a situation similar to running it directly in the shell.

For information, file.in and file.out (the expected resulting file, when running the command in the shell) are both about 200 KB large.

I tried using Unix.system instead of Command.sys, but didn't notice any differences.

I'd like to know if Sys.command has some known limitations concerning memory (e.g. excessive memory usage), and what is the best way to identify why the behavior of the external program changes so drastically.

anol
  • 8,264
  • 3
  • 34
  • 78

2 Answers2

2

Sys.command just calls system() from the C library. The chances that the problem is in the thin wrapper around system() are pretty small.

Most likely some aspect of the process context is just different in the two cases.

The first thing I'd try would be to add a small amount of tracing to the solver code to get a feel for what's happening in the failure case.

If you don't have sources to the solver, you could just try re-creating the environment that seems to work. Something like the following might be worth a try:

Sys.command "/bin/bash -l -c 'solver file.in > file.out'"

This depends on the availability of bash. The -l flag tells bash to pretend it's a login shell. If you don't have bash you can try something similar with whatever shell you do have.

Update

OK, it seems that the memory blowup is happening in your OCaml code before you run the solver. So the solver isn't the problem.

It's hard to say without knowing more about your OCaml code whether it's consuming a reasonable amount of memory.

It doesn't sound on the face of it like you're running out of stack space, so I wouldn't worry about lack of tail recursion right off. Often this is something to think about.

It actually sounds a lot like you have an infinite regression with memory being allocated along the way. This will eventually exhaust your memory space whether you have swapping turned on or not.

You can rule this out if your code works on a small example of whatever problem you're trying to solve. In that case, you might just have to reengineer you solution to take less memory.

Jeffrey Scofield
  • 65,646
  • 2
  • 72
  • 108
  • Good suggestion, I wrapped the solver call in a Bash script and I realized the solver is never actually called. Instead, the memory allocation error occurs before calling the solver, but after the line preceding the `Sys.command` call, which contains a `printf` that is correctly output. It seems that, before actually calling `Sys.command`, some heavy things are allocated in memory. I'll try to see what's happening there. – anol Jul 14 '14 at 15:07
  • It seems the solver can run indeed, but either due to some unlikely coincidence, or due to the behavior of OCaml's garbage collector during memory shortage, the meager 6 MB needed by the solver process were not available when it was run, simply because the rest of the memory was being used by the OCaml process. I would find it very unlikely that my memory was being entirely consumed by the OCaml process, but I seem to have deactivated swap space on my machine. Do you believe this could cause some issues when memory is scarce? – anol Jul 14 '14 at 15:25
0

After following the advice of Jeffrey Scofield, I realized that the out of memory issue happened before calling the solver, despite the error message.

The following simple OCaml file and Bash script were used to confirm it (matrix dimensions need to be adapted according to the available memory in your system):

test.ml:

let _ = 
  let _ = Array.create_matrix 45000 5000 0 in
  Sys.command "./test.sh"

test.sh:

#!/bin/bash

for x in {1..20000};do :;done

Using a script to measure memory usage (such as the one referenced in this question), I've been able to confirm that, while the Bash script uses no more than 5 MB on my machine, and the original program peaks over 1.7 GB, the displayed error message seems to associate the error to the Sys.command line, even though this would be highly unlikely in practice.

In other words, to debug memory usage of external commands, it's best to ensure that the external process is actually called, otherwise the error message may be misleading.

Community
  • 1
  • 1
anol
  • 8,264
  • 3
  • 34
  • 78