13

How do you query mongodb to find the length of a particular string/text field?

And how would you find the max length of a query set?

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
user1018364
  • 285
  • 1
  • 4
  • 10

6 Answers6

4

Sky is the limit! No, actually it is 16 MB for a document in mongodb. This can be the maximum length of a string in the records.

In order to find the maximum length in a query set, you can do this work around:

  1. Keep the length of text together with the text itself.
  2. Sort your result set descending on the length.
  3. Get the first element which has maximum length.
ogzd
  • 5,532
  • 2
  • 26
  • 27
4

Unfortunately the aggregation framework doesn't support a "len" operator to automatically convert strings to their length while you do a query. So you have to solve this in your own code. You could

  1. use a MapReduce function to calculate string lengths
  2. query for the strings and calculate their length on the application layer

The difference between these method is that the first runs on the database while the latter runs on your application server. I would recommend the latter option, because MapReduce can be very slow and cumbersome to use.

Philipp
  • 67,764
  • 9
  • 118
  • 153
3

Starting Mongo 3.4, the $strLenCP aggregation operator can be used to get a string's length:

// { a: "Hello World" }
// { a: "42" }
// { a: "Hello World!" }
db.collection.aggregate([{ $addFields: { length: { $strLenCP: "$a" } } }])
// { a: "Hello World",  length: 11 }
// { a: "42",           length: 2 }
// { a: "Hello World!", length: 12 }

and in order to get the max length from all documents via a $group/$max stage:

db.collection.aggregate([
  { $group: { _id: null, longest: { $max: { $strLenCP: "$a" } } } }
])
// { "_id" : null, longest: 12 }
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
2

how about use regular expression instead.

> db.apps.find({$where:"(this.id.length gt 6) && (this.id.length lt 15) " } ).count();
2548
> db.apps.find({$where:" (this.id.length gt 6) && (this.id.length lt 15) " } ).explain();
{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 2548,
    "nscannedObjects" : 88736,
    "nscanned" : 88736,
    "nscannedObjectsAllPlans" : 88736,
    "nscannedAllPlans" : 88736,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 1,
    "nChunkSkips" : 0,
    "millis" : 1523,
    "indexBounds" : {

    },
    "server" : "shuhaimac.local:27017"
}
> db.apps.find({id:/\w{7,16}/i}).count();
2548
> db.apps.find({id:/\w{7,16}/i}).explain();
{
    "cursor" : "BtreeCursor id_1 multi",
    "isMultiKey" : false,
    "n" : 2548,
    "nscannedObjects" : 2548,
    "nscanned" : 88736,
    "nscannedObjectsAllPlans" : 2548,
    "nscannedAllPlans" : 88736,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 122,
    "indexBounds" : {
        "id" : [
            [
                "",
                {

                }
            ],
            [
                /\w{7,16}/i,
                /\w{7,16}/i
            ]
        ]
    },
    "server" : "shuhaimac.local:27017"
}
shuhai
  • 49
  • 4
1

So, I hope this helps. :-) I ran into the same problem - and it took me a while to get the map-reduce working.

$response = $Mongo->yourdb->command(array(
    "mapreduce" => "yourcollection",
    "map" => new MongoCode(" function() { emit( this.groupbykey, this.thestring.length ); } "),
    "reduce" => new MongoCode(" function(k, vals) { return Math.max.apply(null, vals); } "),
    "query" => array("groupbykey" => "somevalue"),
    "out" => array("inline" => 0)
));

Response will hold the map-reduce result

Array
(
    [results] => Array
        (
            [0] => Array
                (
                    [_id] => groupbykeyvalue
                    [value] => 106
                )

        )

    [counts] => Array
        (
            [input] => 7341
            [emit] => 7341
            [reduce] => 76
            [output] => 1
        )

    [timeMillis] => 189
    [timing] => Array
        (
            [shardProcessing] => 171
            [postProcessing] => 17
        )

    [shardCounts] => Array
        (
            [someshard:27017] => Array

Good luck, let me know if you need a different variant!

Bas Kuis
  • 748
  • 1
  • 7
  • 20
-2

Unlike SQL MongoDB doesn't really know the length of fields as such. At most when it indexes it knows that the field is either under 1024 bytes or not.

As such this is something you will probably have to fix client side. You could use a $where here but I think your looking at this wrong if you want to do that.

You could also use and MR here as @Philipp states but again you are probably looking into the wrong thing here.

A query in MongoDB is in fact a BSON document. As such the maximum length of a query set (depending on what you define as a "query set") is always 16MB (at the moment).

Many drivers provide a means by which to encode a structure (hash or dict or whatever) to BSON allowing you to judge the length of the encoded string to understand the size of your query.

Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • 5
    this is simply untrue: MongoDB absolutely knows the length of the strings, it's part of the bson spec for the type string, the first 4 bytes is the length of the string. There is just no operator that allows you to query on it. – Asya Kamsky May 17 '14 at 20:40
  • @AsyaKamsky Fair enough should have checked the spec before writing that – Sammaye May 17 '14 at 20:42