I'm using GCP's Cloud Notebook VM's. I have a 200+ gb RAM VM running and am attempting to download about 70gb of data from BigQuery into memory using the bigquery storage engine.
Once it gets to around 50gb the kernel crashes --
Tailing the logs, sudo tail -20 /var/log/syslog
- here's what I find:
Dec 2 13:35:57 pytorch-20200908-152245 kernel: [60783.550367] Task in /system.slice/jupyter.service killed as a result of limit of /system.slice/jupyter.service
Dec 2 13:35:57 pytorch-20200908-152245 kernel: [60783.563843] memory: usage 53350876kB, limit 53350964kB, failcnt 1708893
Dec 2 13:35:57 pytorch-20200908-152245 kernel: [60783.570582] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Dec 2 13:35:57 pytorch-20200908-152245 kernel: [60783.578694] kmem: usage 110900kB, limit 9007199254740988kB, failcnt 0
Dec 2 13:35:57 pytorch-20200908-152245 kernel: [60783.585267] Memory cgroup stats for /system.slice/jupyter.service: cache:752KB rss:53239292KB rss_huge:0KB mapped_file:60KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:53239292KB inactive_file:400KB active_file:248KB unevictable:0KB
Dec 2 13:35:57 pytorch-20200908-152245 kernel: [60783.612963] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 2 13:35:57 pytorch-20200908-152245 kernel: [60783.621645] [ 787] 1003 787 99396 17005 63 3 0 0 jupyter-lab
Dec 2 13:35:57 pytorch-20200908-152245 kernel: [60783.632295] [ 2290] 1003 2290 4996 966 14 3 0 0 bash
Dec 2 13:35:57 pytorch-20200908-152245 kernel: [60783.642309] [13143] 1003 13143 1272679 26639 156 6 0 0 python
Dec 2 13:35:58 pytorch-20200908-152245 kernel: [60783.652528] [ 5833] 1003 5833 16000467 13268794 26214 61 0 0 python
Dec 2 13:35:58 pytorch-20200908-152245 kernel: [60783.661384] [ 6813] 1003 6813 4996 936 14 3 0 0 bash
Dec 2 13:35:58 pytorch-20200908-152245 kernel: [60783.670033] Memory cgroup out of memory: Kill process 5833 (python) score 996 or sacrifice child
Dec 2 13:35:58 pytorch-20200908-152245 kernel: [60783.680823] Killed process 5833 (python) total-vm:64001868kB, anon-rss:53072876kB, file-rss:4632kB, shmem-rss:0kB
Dec 2 13:38:07 pytorch-20200908-152245 sync_gcs_service.sh[806]: GCS bucket is not specified in GCE metadata, skip GCS sync
Dec 2 13:39:03 pytorch-20200908-152245 bash[787]: [I 13:39:03.463 LabApp] Saving file at /outlog.txt
I followed this guidance and allocated 100gb of RAM How to increase Jupyter notebook Memory limit? but it's still crashing at around 55gb. e.g., 53350964kB
is the limit in the logs.
How can I utilize the available memory of my machine? Thanks!
Tacking on what worked - changing this config setting:
/sys/fs/cgroup/memory/system.slice/jupyter.service/memory.limit_in_bytes
to a higher number.