27

Update: follow-up to MongoDB Get names of all keys in collection.

As pointed out by Kristina, one can use Mongodb 's map/reduce to list the keys in a collection:

db.things.insert( { type : ['dog', 'cat'] } );
db.things.insert( { egg : ['cat'] } );
db.things.insert( { type :  [] }); 
db.things.insert( { hello : []  } );

mr = db.runCommand({"mapreduce" : "things",
"map" : function() {
    for (var key in this) { emit(key, null); }
},  
"reduce" : function(key, stuff) { 
   return null;
}}) 

db[mr.result].distinct("_id")

//output: [ "_id", "egg", "hello", "type" ]

As long as we want to get only the keys located at the first level of depth, this works fine. However, it will fail retrieving those keys that are located at deeper levels. If we add a new record:

db.things.insert({foo: {bar: {baaar: true}}})

And we run again the map-reduce +distinct snippet above, we will get:

[ "_id", "egg", "foo", "hello", "type" ] 

But we will not get the bar and the baaar keys, which are nested down in the data structure. The question is: how do I retrieve all keys, no matter their level of depth? Ideally, I would actually like the script to walk down to all level of depth, producing an output such as:

["_id","egg","foo","foo.bar","foo.bar.baaar","hello","type"]      

Thank you in advance!

Community
  • 1
  • 1
Andrea Fiore
  • 1,628
  • 2
  • 14
  • 18

4 Answers4

27

OK, this is a little more complex because you'll need to use some recursion.

To make the recursion happen, you'll need to be able to store some functions on the server.

Step 1: define some functions and put them server-side

isArray = function (v) {
  return v && typeof v === 'object' && typeof v.length === 'number' && !(v.propertyIsEnumerable('length'));
}

m_sub = function(base, value){
  for(var key in value) {
    emit(base + "." + key, null);
    if( isArray(value[key]) || typeof value[key] == 'object'){
      m_sub(base + "." + key, value[key]);
    }
  }
}

db.system.js.save( { _id : "isArray", value : isArray } );
db.system.js.save( { _id : "m_sub", value : m_sub } );

Step 2: define the map and reduce functions

map = function(){
  for(var key in this) {
    emit(key, null);
    if( isArray(this[key]) || typeof this[key] == 'object'){
      m_sub(key, this[key]);
    }
  }
}

reduce = function(key, stuff){ return null; }

Step 3: run the map reduce and look at results

mr = db.runCommand({"mapreduce" : "things", "map" : map, "reduce" : reduce,"out": "things" + "_keys"});
db[mr.result].distinct("_id");

The results you'll get are:

["_id", "_id.isObjectId", "_id.str", "_id.tojson", "egg", "egg.0", "foo", "foo.bar", "foo.bar.baaaar", "hello", "type", "type.0", "type.1"]

There's one obvious problem here, we're adding some unexpected fields here: 1. the _id data 2. the .0 (on egg and type)

Step 4: Some possible fixes

For problem #1 the fix is relatively easy. Just modify the map function. Change this:

emit(base + "." + key, null); if( isArray...

to this:

if(key != "_id") { emit(base + "." + key, null); if( isArray... }

Problem #2 is a little more dicey. You wanted all keys and technically "egg.0" is a valid key. You can modify m_sub to ignore such numeric keys. But it's also easy to see a situation where this backfires. Say you have an associative array inside of a regular array, then you want that "0" to appear. I'll leave the rest of that solution up to you.

Vala
  • 5,628
  • 1
  • 29
  • 55
Gates VP
  • 44,957
  • 11
  • 105
  • 108
  • Thanks Gates! I've also found another solution (which however does not involve using map/reduce) explained here: http://groups.google.com/group/mongodb-user/browse_thread/thread/3e10a4b409dd6cb4/ccc9de1fafafe37e?lnk=gst&q=list+keys+depth#ccc9de1fafafe37e – Andrea Fiore Jun 13 '10 at 10:44
  • See the new code. The first part of step 3 refers to the functions named in step 2. – Gates VP Oct 21 '13 at 21:13
  • So the date on this answer is June 2010, it's quite likely that you'll need to add the new `out` parameter which did not exist at the time of this writing. Frankly, you probably don't even want to use M/R for this as it can probably be done using the much better Aggregation Framework. – Gates VP Oct 22 '13 at 23:49
  • 2
    You are freaking awesome, even in 2017! – Orelsanpls Sep 15 '17 at 15:15
8

With Gates VP's and Kristina's answers as inspiration, I created an open source tool called Variety which does exactly this: https://github.com/variety/variety

Hopefully you'll find it to be useful. Let me know if you have questions, or any issues using it.

j0k
  • 22,600
  • 28
  • 79
  • 90
James Cropcho
  • 3,302
  • 1
  • 20
  • 12
  • 11
    That we need an open-source tool to query "schema" is kind of sad. I see reason why people choose MongoDB: to use its appropriateness for true "document" storage (in the sense that is highly nested, or structure cannot be known a priori, or the nature of queries is such that for some reason only MongoDB is a way to go), first, and second, as a way for lazy or junior developers to avoid using a more structure-enforced (typically relational) database system. The latter in my experience is far, far too common, especially among start-ups. Just because JSON is easy doesn't mean it's right. – Adam Donahue Apr 10 '14 at 00:04
1

I solved problem #2 stated by Gates where for example data.0, data.1, data.2 was returned. Even though these are valid keys as stated above, I wanted to get rid of them for presentation purposes. I solved it by a quick edit in the m_sub function as shown below.

const m_sub = function (base, value) {
for (var key in value) {
    if(key != "_id" && isNaN(key)){
        emit(base + "." + key, null);
        if (isArray(value[key]) || typeof value[key] == 'object') {
            m_sub(base + "." + key, value[key]);
        }
    }
}

This change also has the above solution for problem #1 implemented and the only change made is in the first if-statement where I changed this:

if(key != "_id")

To this using the isNaN(x) function:

if(key != "_id" && isNaN(key))

Hope this helps someone, and if there is a problem with this solution please give feedback!

Andreas
  • 53
  • 1
  • 5
0

as a simple function;

const getProps = (db, collection) => new Promise((resolve, reject) => {
  db
  .collection(collection)
  .mapReduce(function() {
    for (var key in this) { emit(key, null) }
  }, (prev, next) => null, {
    out: collection + '_keys'
  }, (err, collection_props) => {
    if (err) reject(err)

    collection_props
    .find()
    .toArray()
    .then(
      props => resolve(props.map(({_id}) => _id))
    )
  })
})
Ahmet Şimşek
  • 1,391
  • 1
  • 14
  • 24