2

we suddenly got MongoDB servers crashing on docker container mongo:4.4 with this error:

couldn't open [/proc/1/stat] Too many open files in system","file":"src/mongo/util/processinfo_linux.cpp","line":78}

you can see the full log here:

[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:47:43.080+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn60","msg":"Slow query","attr":{"type":"command","ns":"myDB.user","command":{"aggregate":"user","pipeline":[{"$match":{"__t":"user"}},{"$unwind":{"path":"$security.tokens.learner","preserveNullAndEmptyArrays":true}},{"$unwind":{"path":"$security.tokens.cp","preserveNullAndEmptyArrays":true}},{"$unwind":{"path":"$security.tokens.admin","preserveNullAndEmptyArrays":true}},{"$match":{"$or":[{"$or":[{"$and":[{"trash.status":false},{"$or":[{"learner.lock.status":false},{"learner.lock.status":{"$exists":false}}]},{"learner.suspend.status":false},{"security.tokens.learner.token":"asdjasdjkasdasdbashd"},{"security.tokens.learner.expireDate":{"$gte":{"$date":"2020-12-01T12:47:42.977Z"}}},{"shared.roles.learner":true}]},{"$and":[{"security.tokens.learner.token":"asdjasdjkasdasdbashd"},{"security.tokens.learner.expireDate":{"$gte":{"$date":"2020-12-01T12:47:42.977Z"}}},{"$or":[{"learner.lock.status":false},{"learner.lock.status":{"$exists":false}}]},{"learner.suspend.status":false},{"$or":[{"facebook.registered":true},{"google.registered":true},{"linkedin.registered":true}]}]}]},{"$and":[{"trash.status":false},{"$or":[{"cp.lock.status":false},{"cp.lock.status":{"$exists":false}}]},{"cp.suspend.status":false},{"security.tokens.cp.token":"asdjasdjkasdasdbashd"},{"security.tokens.cp.expireDate":{"$gte":{"$date":"2020-12-01T12:47:42.977Z"}}},{"shared.roles.cp":true}]},{"$and":[{"trash.status":false},{"$or":[{"admin.lock.status":false},{"admin.lock.status":{"$exists":false}}]},{"admin.suspend.status":false},{"security.tokens.admin.token":"asdjasdjkasdasdbashd"},{"security.tokens.admin.expireDate":{"$gte":{"$date":"2020-12-01T12:47:42.977Z"}}},{"shared.roles.admin":true}]}]}}],"cursor":{},"$db":"myDB"},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1569,"cursorExhausted":true,"numYields":2,"nreturned":1,"queryHash":"167A82D6","planCacheKey":"C324EA6F","reslen":4753,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":5}},"Global":{"acquireCount":{"r":5}},"Database":{"acquireCount":{"r":5}},"Collection":{"acquireCount":{"r":5}},"Mutex":{"acquireCount":{"r":3}}},"storage":{},"protocol":"op_msg","durationMillis":101}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.000+00:00"},"s":"E",  "c":"-",        "id":23077,   "ctx":"ftdc","msg":"Assertion","attr":{"error":"Location13538: couldn't open [/proc/1/stat] Too many open files in system","file":"src/mongo/util/processinfo_linux.cpp","line":78}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"thread68","msg":"WiredTiger error","attr":{"error":23,"message":"[1606827505:339548][1:0x7f0bbd7e2700], log-server: __directory_list_worker, 46: /data/db/journal: directory-list: opendir: Too many open files in system"}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"thread68","msg":"WiredTiger error","attr":{"error":23,"message":"[1606827505:339753][1:0x7f0bbd7e2700], log-server: __log_prealloc_once, 505: log pre-alloc server error: Too many open files in system"}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"thread68","msg":"WiredTiger error","attr":{"error":23,"message":"[1606827505:339773][1:0x7f0bbd7e2700], log-server: __log_server, 961: log server error: Too many open files in system"}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"thread68","msg":"WiredTiger error","attr":{"error":-31804,"message":"[1606827505:339785][1:0x7f0bbd7e2700], log-server: __log_server, 961: the process must exit and restart: WT_PANIC: WiredTiger library panic"}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"F",  "c":"-",        "id":23089,   "ctx":"thread68","msg":"Fatal assertion","attr":{"msgid":50853,"file":"src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp","line":520}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"F",  "c":"-",        "id":23090,   "ctx":"thread68","msg":"\n\n***aborting after fassert() failure\n\n"}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"thread68","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.350+00:00"},"s":"E",  "c":"CONTROL",  "id":31430,   "ctx":"thread68","msg":"Error collecting stack trace","attr":{"error":"unw_get_proc_name(55CF8D21B921): unspecified (general) error\nerror: unw_step: unspecified (general) error\nunw_get_proc_name(55CF8D21B921): unspecified (general) error\nerror: unw_step: unspecified (general) error\n"}}

this is the result ulimit -a inside the mongo container:

$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31806
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) unlimited
open files                      (-n) 90000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

How do I fix this?

hossein derakhshan
  • 771
  • 2
  • 10
  • 23
  • 1
    This seems more of a system administration question than a programming one. – Joe Dec 01 '20 at 19:28
  • did you do any research or similar issues? if so add them to the post – Minsky Dec 01 '20 at 19:45
  • What is your question? – D. SM Dec 01 '20 at 20:19
  • @D.SM How do I fix the crash? – hossein derakhshan Dec 01 '20 at 20:36
  • Increase the open file limit. My guess is the 90000 number isn't applicable to the mongod process and you need to figure out what the actual limit is that is in place. – D. SM Dec 01 '20 at 22:43
  • Here are some useful references: [Operations Checklist - Operating System Configuration](https://docs.mongodb.com/manual/administration/production-checklist-operations/#operating-system-configuration) _and_ [UNIX ulimit Settings](https://docs.mongodb.com/manual/reference/ulimit/index.html) – prasad_ Dec 02 '20 at 04:27

2 Answers2

4

I found the problem which is not related to mongoDB. I had a file descriptor leak on another process and it sucked our resources. that's why MongoDB didn't have enough resources to work properly.

first of all, I checked file descriptors of all process with this command (for more information about what file descriptor is read this link https://stackoverflow.com/a/5256705/7339000):

cd /proc
for pid in [0-9]*
do
    echo "PID = $pid with $(ls /proc/$pid/fd/ | wc -l) file descriptors"
done

then I realized we had a node process which have more than 40000 file descriptors which was unusual. in fact we had a file descriptor leak. when we fixed that problem, we didn't encounter any issue with MongoDB anymore.

hossein derakhshan
  • 771
  • 2
  • 10
  • 23
0

The best way to solve this problem is:

For macOS systems that have installed MongoDB Community using the brew installation method, the recommended open files value is automatically set when you start MongoDB through brew services. So this problem will be solved reinstalling using the brew services (https://www.mongodb.com/docs/manual/tutorial/install-mongodb-on-os-x/)

ex.

brew install mongodb-community@4.4

For macOS systems running MongoDb Enterprise or using the TGZ installation method, use the launchctl limit command to set the recommended values 127. See your operating system documentation for the precise procedure for changing system limits on running systems.

Cristian Zumelzu
  • 842
  • 10
  • 15