0

I have a content like this is my content and I want to store it to the MongoDB collection. The problem is I want to update my data and I want to find the _id of data on the collection without send the query to database. I came up withe the idea to generate my _ids based on the content of my document. I tried to convert the content to the sha256 or ... and then by using the bson.ObjectId in python generate my _id so any time I want to update for example the timestamp in my collection I simply generate the _id and send an update query to the database collection. But I facing an error below:

bson.errors.InvalidId: '3e2550e3ffd205d10900d893dd8d91be9f446d60' is not a valid ObjectId, it must be a 12-byte input or a 24-character hex string

I wondering if the idea is wrong or ... Could you please guide me ?

TomCat
  • 83
  • 7

1 Answers1

0

Your idea is fine; the trick is you do not have to use the ObjectId type for the _id field. The following pseudocode works fine where the _id is a String:

String sid = hex(sha256("this"+"is"+"my"+"content"));
Document doc = {_id:sid, theTimestamp:ISODate(), ... }
db.collection.insert(doc);
...
db.collection.update({_id:sid}, {$set: {theTimestamp:ISODate()});
Buzz Moschetti
  • 7,057
  • 3
  • 23
  • 33
  • The problem is I don't want to store ```_id``` as string.I tested your suggestion.But pymongo stores the ```_id``` as an string – TomCat Nov 24 '21 at 16:11
  • I don't understand. You want to be able to construct the `_id` based on pieces of the data itself. You later want to be able to rapidly assemble those pieces and do a fast lookup (or update) directly to the `_id`. That means `_id` needs to be `string`. `ObjectId` is useful as a "nearly-unique" ID when you do NOT want to construct the `_id` yourself. – Buzz Moschetti Nov 24 '21 at 18:08
  • As you know use a string as ```_id``` causes queries slow down.So the ```_id``` have to be ```ObjectId``` or ```int``` or something like that to prevent the query slowness . – TomCat Nov 24 '21 at 18:51
  • That is not true. Where did you read that? The basic performance is the same for any type in `_id`; the length of the key could make a slight difference. `ObjectId` is represented as 12 bytes of data. A `string` of length 12 bytes or so will have the same performance. A string of 64 bytes might be slightly less. `_id` automatically carries a unique index AND uses the special IDHACK in the lookup plan (shown in explain() ). – Buzz Moschetti Nov 24 '21 at 22:26
  • This may be useful: https://stackoverflow.com/questions/28895067/using-uuids-instead-of-objectids-in-mongodb – Buzz Moschetti Nov 24 '21 at 22:32