0

just to check if it is possible and if it worth the trouble:

i'm using python 3.9.5, with mongoengine as ORM.

lets say, i have data to to save in a collection:

{
    "value": string,
    "origin: string,
    "parent": string,
    ....
}

when the fields "value" and "origin" are unique together.

now, i want to be able to control the creation of the _id field, i.e, to generate the value of the ObjectId from those 2 values - that it will still be unique value, won't drop the performance, and yet - the value won't be some random value generated.

any thoughts about this? any ideas?

Update: what if i want to keep using objectId for the _id field?

for objectId, only 12/24 long strings are valid. so i took an idea from here, and wrote this piece of code (that create a hash 24 chars long):

import hashlib
from bson.objectid import ObjectId
the_string = "..............."
24_long_str = hex(int(hashlib.sha256(the_string.encode('utf-8')).hexdigest(), 16) % (10**28))[2:]
oid = ObjectId(24_long_str)

any thoughts about this code? performance issues? will the value be unique to be used as id?

drizzt13
  • 630
  • 6
  • 15
  • Can you clarify why you seek to create an `ObjectId` from 2 known fields? What benefit(s) do you believe you will enjoy? – Buzz Moschetti Jan 11 '22 at 15:53
  • mainly for backward compatibility... someone that uses the data my code creates, expect to have id as ObjectId. but i want to control the value of the OID, because sometimes i can have duplication issues before i save the data (multiple workers working on same input doc, or rather same doc created by the same value+origin) – drizzt13 Jan 11 '22 at 16:18
  • another option can be - is it possible to use other field than id in DBRef (ReferenceField) – drizzt13 Jan 11 '22 at 16:20
  • I know of no scenario where you would get duplicate `ObjectId` using the default internal algorithm before save, regardless of number of workers. – Buzz Moschetti Jan 11 '22 at 16:38
  • Of course i did not mean duplicate objectid, but duplicate doc, created by two different workers, before saving to db (docs on memory of python runtime) – drizzt13 Jan 11 '22 at 16:41
  • Note that you could also keep the autogenerated ObjectId and rely on an index to ensure uniqueness of the 2 keys (one way is using the unique_with constraint from mongoengine – bagerard Jan 11 '22 at 20:14

1 Answers1

1

If you assert value "plus" origin are truly unique, then just construct _id from both prior to insert. _id does not have to be autogenerated nor must it be an ObjectId type.

var d = {
    "value": "ABC",
    "origin": "here",
    "parent": "P"
};

// One possible format for combo of value + origin:
d['_id'] = d['value'] + "_" + d['origin'];
db.foo.insert(d);
Buzz Moschetti
  • 7,057
  • 3
  • 23
  • 33