1

Using MongoDB 4.0 I have to compute a new field basing on existing field. An example of documents in mycollection:

{
    _id: ...,
    f1: 'a',
    f2: 'b'
}

My python code, after connecting to DB:

Please note that myJoin is here just for example. In real use scenario this function is quite complex (custom cryptographic calculations and bytes manipulations).

def myJoin(left, right):
  return left+'-myDelimiter-'+right


myPipeline = [
    {
        '$match': {
            'f1': {
                '$exists': True
            }
        }
    }, {
        '$addFields': {
            'f3': myJoin('$f1', '$f2')
        }
    }
]

mydb.mycollection.aggregate(myPipeline)

However in DB I see:

{
    _id: ...,
    f1: 'a',
    f2: 'b',
    f3: '$f1-myDelimiter-$f2'
}

But I want:

{
    _id: ...,
    f1: 'a',
    f2: 'b',
    f3: 'a-myDelimiter-b'
}

For the moment I am using pipeline aggregations. But other strategies are welcome.

shogitai
  • 1,823
  • 1
  • 23
  • 50
  • 1
    There is no way to do that. All stages of aggregation pipeline are executed on the database side. Your python function exists on the application side. These are at least different processes, and very likely different systems/hosts. You will need to implement your `myJoin` logic using mongodb expression operators https://docs.mongodb.com/manual/reference/operator/aggregation/#aggregation-pipeline-operators . In this particular example it is something like https://docs.mongodb.com/manual/reference/operator/aggregation/concat/#mongodb-expression-exp.-concat – Alex Blex Jul 07 '21 at 12:02
  • Ok, I suspected it couldn't be done. I am forced to do a find_one (), perform the calculations and then update the document with a find_one_and_update ({_ id: '...'}) referencing the two queries with '_id'. It gets quite long time execution over billions of records. – shogitai Jul 07 '21 at 12:29
  • 1
    Yes, but if we are talking about CPU hungry cryptography it wouldn't be much faster on the database side. If time is more expensive than money you can run myJoin on multiple clients in parallel. Billions of records seems large enough. If it is one off task you could hire some cloud VMs or even functions like AWS Lambda. – Alex Blex Jul 07 '21 at 12:38
  • Thanks, very well explained. Since is only need tu run it once... for this time its okay to wait for some days. – shogitai Jul 08 '21 at 16:36

1 Answers1

1

You can use $concat operator to concat fields with delimiters,

{
  '$addFields': {
    'f3': { '$concat': ['$f1', '-myDelimiter-', '$f2'] }
  }
}

I used myJoin just for example. My custom function perform a series of cryptographic computations and bytes manipulations.

I don't think it is possible to integrate into a query for python lang in current MongoDB v4.4,

There is a $function operator starting from MongoDB 4.4, There you can write javascript code and execute in query, but, it's expensive for query performance.

I would suggest you to do this operation after the query on the result.

turivishal
  • 34,368
  • 7
  • 36
  • 59
  • Thanks, in the case of the join of strings it does the trick. I used myJoin just for example. My custom function perform a series of cryptographic computations and bytes manipulations. – shogitai Jul 07 '21 at 12:25