4

Using the Java driver, we today discovered that is possible to bring a MongoDB instance down with a segmentation fault.

new Mongo().getDB("test").getCollection("test").
    insert(new BasicDBObject("\u0000Žö", ""));

This will produce the following output from mongod before it dies:

Fri Nov 16 18:53:18 Invalid access at address: 0xbac3c5fe from thread: conn5

Fri Nov 16 18:53:18 Got signal: 11 (Segmentation fault: 11).

Fri Nov 16 18:53:18 Backtrace:
0x10004241b 0x10005628b 0x100056941 0x7fff828afcfa 0x1 0x100281611 0x100288c91 0x10006c501 0x10058e50c 0x1005e31d3 0x7fff8285b8bf 0x7fff8285eb75 
 0   mongod                              0x000000010004241b _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x000000010005628b _ZN5mongo10abruptQuitEi + 987
 2   mongod                              0x0000000100056941 _ZN5mongo24abruptQuitWithAddrSignalEiP9__siginfoPv + 673
 3   libsystem_c.dylib                   0x00007fff828afcfa _sigtramp + 26
 4   ???                                 0x0000000000000001 0x0 + 1
 5   mongod                              0x0000000100281611 _ZN5mongo14receivedInsertERNS_7MessageERNS_5CurOpE + 1841
 6   mongod                              0x0000000100288c91 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE + 4705
 7   mongod                              0x000000010006c501 _ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE + 257
 8   mongod                              0x000000010058e50c _ZN5mongo3pms9threadRunEPNS_13MessagingPortE + 1084
 9   mongod                              0x00000001005e31d3 thread_proxy + 163
 10  libsystem_c.dylib                   0x00007fff8285b8bf _pthread_start + 335
 11  libsystem_c.dylib                   0x00007fff8285eb75 thread_start + 13

I've been trying to understand what on earth makes this magical field name special. Removing any of the characters involved makes mongodb survive just fine, and the stack trace isn't making me any wiser.

I've written up a short blog post on the issue, and filed a JIRA ticket at mongodb.org, but my curiosity is killing me. Can anyone figure out what makes \u0000Žö special?

Edit to clarify: \u0000 and \u0000Ž is fine, and so is \u0000Žsomerandomtext

Joel Westberg
  • 2,656
  • 1
  • 21
  • 27
  • Is this bug resolved by the lastest mongo java driver 2.10.1? I tested and find without the --objcheck turned on, the db server will still crash. – DiveInto Jan 21 '13 at 07:24

2 Answers2

1

Reason why it's not working well is the way unicode literal statement is corrupted. There might be possibility, that java driver does not check properly for possibility, that there might be unicode in unicode literal statement ;) On mongo shell creating object with such key (copied to UTF8-based terminal) throws error: BSONElement: bad type -59

mrówa
  • 5,671
  • 3
  • 27
  • 39
  • Adding the unicode field as part of a subdocument works though `db.test.save({"x" : {"\u0000Žö" : ""}})` and results in an assertion error when trying to view the document via `db.test.find()`... Also, what do you mean by "unicode in unicode literal statement"? – Joel Westberg Nov 16 '12 at 19:35
  • Okay, I might be blind - thought you missed one zero and it would be \u000'UNICODECHAR'. Next possible explanation: Strings in BSON are null-terminated strings and java (http://stackoverflow.com/questions/318775/null-u0000-in-java-string) might have had some problems in some libraries around it. If it actually sends to server string with null at the beginning, we're at home. – mrówa Nov 16 '12 at 21:56
  • This turned out to be the case. It seems like it is indeed the NUL-termination of the the BSON format that that is the problem. The crashes happen due to how, once a null is reached, it attempts to parse whatever comes next as some sort of control String. If both the next two characters that follow the NUL are multiple-byte UTF-8 strings, it will crash in parsing. – Joel Westberg Nov 19 '12 at 14:34
0

Java Strings can only be expressed in ASCII encoding, to get a unicode literal, you'll have to get each unicode code and build the string:

String unicode = "\u1123\u5678hello";

will result:

ᄣ噸hello
Mordechai
  • 15,437
  • 2
  • 41
  • 82