Can Checkpoint/restart be implemented using the core dump of a process? The core file contains a complete memory dump of the process, thus in theory it should be possible to restore the process to the same state it was in when the core was dumped.
-
Actually i am asking for a particular process.As the file descriptor table is maintained by the kernel and thus doesn't lie in the process’ address space,restoring the file descriptors may be troublesome. – rogue_knight9 Apr 16 '13 at 21:50
4 Answers
Yes, this is possible. GNU Emacs does this to optimize its startup time. It loads a bunch of Lisp files to produce an image and then dumps a core which can be restarted.
Several years ago, I created a patch for GNU Make 3.80 to do exactly the same thing (using code borrowed from GNU Emacs).
With this patch, you have a new option in make: make --dump
. The utility now reads your Makefile
, and then instead of executing the rules, it produces a core dump which can be restarted to do the actual build (evaluation of the parsed rule tree).
This was a saving, because the project was so large that loading all of the make rules across the source tree took thirty seconds! With this optimization, incremental builds launched almost instantly, without the half minute startup penalty.
No kernel support is required for this. What is required is knowledge about the structure of the core file.
In addition to this approach, there was a process checkpointing project for Linux many years ago (wonder what happened to that).

- 55,781
- 9
- 100
- 149
-
-
I can, but it would take some work. The `make` patch was never released to the public, so to find it I have to dig through some archived disks which are offline. The GNU emacs core dump code is easy to find in the small core of C sources. – Kaz Apr 16 '13 at 23:10
-
4The `unexec` of Emacs is not dumping a *core* file, but implementing some perstitent heap machinery, which is not exactly the same thing. The original poster could use some checkpointing library like https://ftg.lbl.gov/projects/CheckpointRestart/ see also http://en.wikipedia.org/wiki/Checkpointing – Basile Starynkevitch Apr 17 '13 at 04:58
As I commented, you could look for application checkpoint and use some libraries like Berkley Lab Checkpoint & Restart. However, these libraries don't use exactly a core(5) dump file, and have several limitations and conventions on what the checkpointing program can do, and what exactly is persistent in the checkpoint image. (open file descriptors and network sockets usually cannot be persisted).
Some Unix (and perhaps some patched Linux kernels) had limited checkpoint facilities in the kernel itself (in the 1980s Cray Unix had some).

- 223,805
- 18
- 296
- 547
No, this is not possible in general without special support from the kernel. The kernel maintains a LOT of per-process state, such as the file descriptor table, IPC objects, etc.
If you were willing to make lots of simplifying assumptions, such as no open files, no open sockets, no living IPC objects, no shared memory regions, and more, then in theory it would be possible, but in practice I don't believe it's possible with Linux even with those concessions.

- 390,455
- 97
- 512
- 589
-
All the information is on the DDR why isn't it possible to restore the machine to the exact state it was before the crash? – 0x90 Apr 16 '13 at 22:33
-
@0x90: it is impossible to checkpoint the entire state, ecause the sockets, IPCs etc... depend not only of the checkpointed process but also of external factors (other processes, remotes TCP connections, etc...) – Basile Starynkevitch Apr 17 '13 at 05:04
Debian has a number of packages you might want to look at :
- blcr-util - Userspace tools to Checkpoint and Restart Linux processes
This is related to BLCR (Berkeley Lab Checkpoint/Restart) , see https://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#whatisblcr
criu - checkpoint and restore in userspace https://criu.org/Main_Page
2.1 docker -supports checkpointing in recent versions, see https://criu.org/Docker
2.1. containerd - daemon to control runC
this contains a checkpointing facility that is interesting.
See also openvz that supports live migration: https://openvz.org/Checkpointing_and_live_migration

- 2,060
- 22
- 33
-
https://news.ycombinator.com/item?id=12180914 see this article with a lot of discussion. – h4ck3rm1k3 Sep 10 '16 at 17:30