How to "hibernate" a process in Linux by storing its memory to disk and restoring it later?

Question

Is it possible to 'hibernate' a process in linux? Just like 'hibernate' in laptop, I would to write all the memory used by a process to disk, free up the RAM. And then later on, I can 'resume the process', i.e, reading all the data from memory and put it back to RAM and I can continue with my process?

What you describe is actually often referred to as 'checkpointing', you might have better luck searching with that term. — Tim Post, Jan 26 '10 at 06:16
https://unix.stackexchange.com/questions/43854/save-entire-process-for-continuation-after-reboot — Ciro Santilli OurBigBook.com, Sep 08 '17 at 15:33

score 59 · Accepted Answer · edited Aug 16 '11 at 17:01

59

I used to maintain CryoPID, which is a program that does exactly what you are talking about. It writes the contents of a program's address space, VDSO, file descriptor references and states to a file that can later be reconstructed. CryoPID started when there were no usable hooks in Linux itself and worked entirely from userspace (actually, it still does work, depending on your distro / kernel / security settings).

Problems were (indeed) sockets, pending RT signals, numerous X11 issues, the glibc caching getpid() implementation amongst many others. Randomization (especially VDSO) turned out to be insurmountable for the few of us working on it after Bernard walked away from it. However, it was fun and became the topic of several masters thesis.

If you are just contemplating a program that can save its running state and re-start directly into that state, its far .. far .. easier to just save that information from within the program itself, perhaps when servicing a signal.

edited Aug 16 '11 at 17:01

Andy Balaam

6,423
6
34
37

answered Jan 26 '10 at 06:22

Tim Post

33,371
15
110
174

5

As of July 2014, unfortnately, CryoPID is not mantained anymore and does not run on recent kernels. But in the meantime new projects are born (some step have been taken even in TCP connection "hibernation"). I've put an [answer](http://stackoverflow.com/a/24991425/1161591) below with updated informations. Check it out! ;) – dappiu Jul 28 '14 at 13:24
1

@dappiu That's great - but CryoPID was just an _example_ in this answer to illustrate how tricky it can be, where I went on to suggest they handle saving the state within the program itself, in such a way that can be easily resumed. CryoPID stagnating doesn't make the answer less relevant. – Tim Post Jul 29 '14 at 04:55
Cryopid2 is more recently active (2013): http://sourceforge.net/projects/cryopid2/ – Leopd Dec 15 '14 at 20:28

score 39 · Answer 2 · answered Jul 28 '14 at 08:44

I'd like to put a status update here, as of 2014.

The accepted answer suggests CryoPID as a tool to perform Checkpoint/Restore, but I found the project to be unmantained and impossible to compile with recent kernels. Now, I found two actively mantained projects providing the application checkpointing feature.

The first, the one I suggest 'cause I have better luck running it, is CRIU that performs checkpoint/restore mainly in userspace, and requires the kernel option CONFIG_CHECKPOINT_RESTORE enabled to work.

Checkpoint/Restore In Userspace, or CRIU (pronounced kree-oo, IPA: /krɪʊ/, Russian: криу), is a software tool for Linux operating system. Using this tool, you can freeze a running application (or part of it) and checkpoint it to a hard drive as a collection of files. You can then use the files to restore and run the application from the point it was frozen at. The distinctive feature of the CRIU project is that it is mainly implemented in user space.

The latter is DMTCP; quoting from their main page:

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

There is also a nice Wikipedia page on the argument: Application_checkpointing

score 23 · Answer 3 · answered Jan 26 '10 at 23:44

The answers mentioning ctrl-z are really talking about stopping the process with a signal, in this case SIGTSTP. You can issue a stop signal with kill:

kill -STOP <pid>

That will suspend execution of the process. It won't immediately free the memory used by it, but as memory is required for other processes the memory used by the stopped process will be gradually swapped out.

When you want to wake it up again, use

kill -CONT <pid>

The more complicated solutions, like CryoPID, are really only needed if you want the stopped process to be able to survive a system shutdown/restart - it doesn't sound like you need that.

score 16 · Answer 4 · answered Jul 14 '12 at 02:14

16

Linux Kernel has now partially implemented the checkpoint/restart futures:https://ckpt.wiki.kernel.org/, the status is here.

Some useful information are in the lwn(linux weekly net): http://lwn.net/Articles/375855/ http://lwn.net/Articles/412749/ ......

So the answer is "YES"

answered Jul 14 '12 at 02:14

Lai Jiangshan

1,420
1
13
23

3

The userspace program is called blcr. – Behrooz Apr 18 '13 at 15:54

score 15 · Answer 5 · answered Jan 25 '10 at 19:15

The issue is restoring the streams - files and sockets - that the program has open.

When your whole OS hibernates, the local files and such can obviously be restored. Network connections don't, but then the code that accesses the internet is typically more error checking and such and survives the error conditions (or ought to).

If you did per-program hibernation (without application support), how would you handle open files? What if another process accesses those files in the interim? etc?

Maintaining state when the program is not loaded is going to be difficult.

Simply suspending the threads and letting it get swapped to disk would have much the same effect?

Or run the program in a virtual machine and let the VM handle suspension.

score 12 · Answer 6 · answered Jan 25 '10 at 19:22

Short answer is "yes, but not always reliably". Check out CryoPID:

http://cryopid.berlios.de/

Open files will indeed be the most common problem. CryoPID states explicitly:

Open files and offsets are restored. Temporary files that have been unlinked and are not accessible on the filesystem are always saved in the image. Other files that do not exist on resume are not yet restored. Support for saving file contents for such situations is planned.

The same issues will also affect TCP connections, though CryoPID supports tcpcp for connection resuming.

After hitting the submit button I now realize this reads a lot like spam/advertisement for CryoPID. It is not -- I am simply a satisfied user of the utility, really. — Ulisses Montenegro, Jan 25 '10 at 19:45

score 7 · Answer 7 · edited Jun 10 '12 at 12:01

I extended Cryopid producing a package called Cryopid2 available from SourceForge. This can migrate a process as well as hibernating it (along with any open files and sockets - data in sockets/pipes is sucked into the process on hibernation and spat back into these when process is restarted).

The reason I have not been active with this project is I am not a kernel developer - both this (and/or the original cryopid) need to get someone on board who can get them running with the lastest kernels (e.g. Linux 3.x).

The Cryopid method does work - and is probably the best solution to general purpose process hibernation/migration in Linux I have come across.

fullreset · Answer 8 · 2010-01-25T19:20:31.423

6

The short answer is "yes." You might start by looking at this for some ideas: ELF executable reconstruction from a core image (http://vx.netlux.org/lib/vsc03.html)

edited Jan 25 '10 at 19:20

answered Jan 25 '10 at 19:10

fullreset

91
3

1

Interesting link; but the link does point out it doesn't work reliably – Will Jan 25 '10 at 19:17

score 3 · Answer 9 · answered Jan 25 '10 at 19:43

As others have noted, it's difficult for the OS to provide this functionality, because the application needs to have some error checking builtin to handle broken streams.

However, on a side note, some programming languages and tools that use virtual machines explicitly support this functionality, such as the Self programming language.

score 1 · Answer 10 · answered Jan 25 '10 at 19:27

1

This is sort of the ultimate goal of clustered operating system. Mathew Dillon puts a lot of effort to implement something like this in his Dragonfly BSD project.

answered Jan 25 '10 at 19:27

Nikolai Fetissov

82,306
11
110
171

Is this feature fully implemented in Dragonfly BSD ? – Arjun J Rao Aug 12 '13 at 11:37

Omid Ataollahi · Answer 11 · 2020-06-30T03:44:00.393

1

adding another workaround: you can use virtualbox. run your applications in a regular virtual machine and simply "save the machine state" whenever you want. I know this is not an answer, but I thought it could be useful when there are no real options.

if for any reason you don't like virtualbox, vmware and Qemu are as good.

edited Jun 30 '20 at 03:44

answered Jun 29 '20 at 05:59

Omid Ataollahi

368
3
8

score 0 · Answer 12 · answered Jan 25 '10 at 19:15

Ctrl-Z increases the chances the process's pages will be swapped, but it doesn't free the process's resources completely. The problem with freeing a process's resources completely is that things like file handles, sockets are kernel resources the process gets to use, but doesn't know how to persist on its own. So Ctrl-Z is as good as it gets.

score 0 · Answer 13 · answered Jan 25 '10 at 19:20

There was some research on checkpoint/restore for Linux back in 2.2 and 2.4 days, but it never made it past prototype. It is possible (with the caveats described in the other answers) for certain values of possible - I you can write a kernel module to do it, it is possible. But for the common value of possible (can I do it from the shell on a commercial Linux distribution), it is not yet possible.

score -2 · Answer 14 · answered Jan 25 '10 at 19:09

-2

There's ctrl+z in linux, but i'm not sure it offers the features you specified. I suspect you asked this question since it doesn't

answered Jan 25 '10 at 19:09

Simon Walker

5,523
6
30
32

How to "hibernate" a process in Linux by storing its memory to disk and restoring it later?

14 Answers14

Linked