0

Would like to learn from you again on Node.js and mongoose.

I have a mongoose schema defined and findOne() returns a doc as below. There are many more elements under the "resource" in the actual doc.

{
    "metadata": {"isActive": true, "isDeleted": false },
    "test": "123",
    "resource": {
        "id": "59e94f3f6d5789611ce9926f",
        "resourceType": "Patient",
        "active": true,
        "gender": "male",
        "birthDate": "2000-01-01T00:00:00.000Z",
        "extension": [
            {
                "url": "hxxp://example.com/fhir/StructureDefinition/patient-default-bundle",
                "valueCodeableConcept": {
                    "code": "sys",
                    "display": ""
                }
            }
        ],
        "link": [],
        "careProvider": [],
        "communication": [],
        "animal": {
            "genderStatus": {
                "coding": []
            },
            "breed": {
                "coding": []
            },
            "species": {
                "coding": []
            }
        },
        "contact": []
    }
}

Question: how can I select all the non-empty fields under 'resource'?

My expected result is as below, i.e all non-empty fields under 'resource' element.

{
  "id": "59e94f3f6d5789611ce9926f",
  "resourceType": "Patient",
  "active": true,
  "gender": "male",
  "birthDate": "2000-01-01T00:00:00.000Z",
  "extension": [
      {
          "url": "hxxp://example.com/fhir/StructureDefinition/patient-default-bundle",
          "valueCodeableConcept": {
              "code": "sys",
              "display": ""
          }
      }
  ]
}

my current coding:

module.exports.findById = function (req, res, next) {
    var resourceId = req.params.resourceId;
    var resourceType = req.params.resourceType;
    var thisModel = require('mongoose').model(resourceType);

    console.log("findById is being called by the API [" + resourceType + "][" + resourceId + "]");
    thisModel.findOne(
        {'resource.id': resourceId, 'metadata.isActive': true, 'metadata.isDeleted': false},
        'resource -_id',
        function(err, doc) {
            if (err) {
                globalsvc.sendOperationOutcome(res, resourceId, "Error", "findOne() Not Found", err, 404);
            }
            else {
                if (doc) {
                    sendJsonResponse(res, 200, doc);
                }  else {
                    delete doc._id;
                    globalsvc.sendOperationOutcome(res, resourceId, "Error", "Id: [" + resourceId + "] Not Found", err, 404);
                }
            }
        }
    );
}
Autorun
  • 319
  • 2
  • 8
  • 20
  • You mean everything which does not have an empty array property? As in "return the documents but don't show those properties if empty"? If that's your ask, then it's actually not very simple at all. The best case would be simply not to store the property "at all" unless you have some data to put into it. That's a lot easier than stripping the properties returned by the server. – Neil Lunn Nov 07 '17 at 05:26
  • Thx Neil, I want everything under 'resource' which is not empty. Moreover, the resource: { } needs to be removed too. Please see my expected result. I agreed with you that those empty fields shoudn't be stored at the first place. For example, the doc is { 'resource': { 'id': '123', 'gender': ""}}, my expected result is {'id': '123'} since 'gender' is empty. – Autorun Nov 07 '17 at 05:34
  • That's what I thought you meant. It's not a simple thing to do. As a "schemaless" document oriented store, the general intention is that if you don't have data for a property then you don't store it at all. Storing empty strings or empty arrays actually is "something". And it takes a really advanced and compute intensive projection with the aggregation framework in order to "remove" those before returning the results. So the general advice here is "don't store empty properties" if you don't want them returned. – Neil Lunn Nov 07 '17 at 05:37
  • Hi Neil, thanks again. You are right. I shouldn't have saved those empty fields. Is there any good sample code that I can check empty fields and remove them in a generic recursive way? ;) I have several very complicated and deep schema. – Autorun Nov 07 '17 at 05:46
  • Hi Neil, how about return all the fields under 'resource' regardless empty or not? How can I extract all the fields under 'resource' in a simple recursive way? – Autorun Nov 07 '17 at 05:52

2 Answers2

2

As noted before, it is far more optimal to actually not store the empty arrays in the MongoDB collection in the first place, than to try and process them out on return of the data. You could only actually omit them from the returned results by either using aggregation framework features in latest releases ( and then still not recursively ) or otherwise live with allowing the server to return the whole object and then strip those properties from the documents before passing them on.

So I would really see this as a two step process to fix the data.

Change Schema to Omit Empty Arrays

Of course you state that you have many more fields in the schema, but from what I can see I can give you a few examples. Basically you need to put a a default value on anything with an array to undefined. Just listing a few as a partial for your schema:

"resource": {
  "extension": {
    "type": [{
      "url": String,
      "valueCodeableConcept": {
        "code": String,
        "display": String
      }
    ],
    "default": undefined
  },
  "link": { "type": [String], "default": undefined },
  "animal": {
    "genderStatus": { 
      "coding": { "type": [String], "default": undefined }
    },
    "breed": {
      "coding": { "type": [String], "default": undefined }
    }
  }
}

That should give you the general idea. With those "default" values, mongoose will not attempt to write an empty array when no other data is provided. Once you fix your schema by notating each array definition like that, then there will be no more empty arrays created.

Trimming the data

This should be a "one off" operation to remove all the properties that are merely hosting empty arrays. That means you also really want to get rid of properties that have nothing but an empty array under each of their inner keys, such as the "animals" property.

So I would just do a simple listing to go through the data rewrite it:

const MongoClient = require('mongodb').MongoClient;

const uri = 'mongodb://localhost/test',
      collectionName = 'junk';

function returnEmpty(obj) {
  var result = {};

  Object.keys(obj).forEach(k => {
    if ( typeof(obj[k]) === "object" && obj[k].constructor === Object ) {
      let temp = returnEmpty(obj[k]);
      if (Object.keys(temp).length !== 0)
        result[k] = temp;
    } else if ( !((Array.isArray(obj[k]) && obj[k].length > 0)
      || !Array.isArray(obj[k]) ) )
    {
      result[k] = obj[k];
    }
  });

  return result;
}

function stripPaths(obj,cmp) {
  var result = {};

  Object.keys(obj).forEach( k => {
    if ( Object.keys(obj[k]).length !== Object.keys(cmp[k]).length ) {
      result[k] = stripPaths(obj[k], cmp[k]);
    } else {
      result[k] = "";
    }
  });

  return result;
}

function dotNotate(obj,target,prefix) {
  target = target || {};
  prefix = prefix || "";

  Object.keys(obj).forEach( key => {
    if ( typeof(obj[key]) === 'object' ) {
      dotNotate(obj[key], target, prefix + key + '.');
    } else {
      target[prefix + key] = obj[key];
    }
  });

  return target;
}

function log(data) {
  console.log(JSON.stringify(data, undefined, 2))
}

(async function() {

  let db;

  try {

    db = await MongoClient.connect(uri);

    let collection = db.collection(collectionName);

    let ops = [];
    let cursor = collection.find();

    while ( await cursor.hasNext() ) {
      let doc = await cursor.next();
      let stripped = returnEmpty(doc);
      let res = stripPaths(stripped, doc);
      let $unset = dotNotate(res);

      ops.push({
        updateOne: {
          filter: { _id: doc._id },
          update: { $unset }
        }
      });

      if ( ops.length > 1000 ) {
        await collection.bulkWrite(ops);
        ops = [];
      }
    }

    if ( ops.length > 0 ) {
      await collection.bulkWrite(ops);
      log(ops);
      ops = [];
    }


  } catch(e) {
    console.error(e);
  } finally {
    db.close();
  }

})();

That basically generates an operation to be fed to bulkWrite() for each document in your collection to $unset the paths that would have empty properties.

For your supplied document, the update would look like:

[
  {
    "updateOne": {
      "filter": {
        "_id": "5a0151108204f6bce9baf86f"
      },
      "update": {
        "$unset": {
          "resource.link": "",
          "resource.careProvider": "",
          "resource.communication": "",
          "resource.animal": "",
          "resource.contact": ""
        }
      }
    }
  }
]

Which basically identifies all properties that had an empty array, and even removes ALL of the keys under "animal" since there each of the keys has an empty array and that key would just be an empty object if we removed just the sub-keys. So instead we remove that whole key and it's sub-keys.

Once run, all of those unneeded keys will be removed from the stored documents and then any query will simply only return what data is actually defined. So this is a little work in the short term for a longer term gain.

Manipulate Result

Of course for the lazy, you can simply apply the basic function used to return the paths to remove with reversed logic to remove the paths from the returned object:

function returnStripped(obj) {
  var result = {};

  Object.keys(obj).forEach(k => {
   if ( typeof(obj[k]) === "object" && obj[k].constructor === Object ) {
     var temp = returnStripped(obj[k]);
     if (Object.keys(temp).length !== 0)
       result[k] = temp;
   } else if ( ((Array.isArray(obj[k]) && obj[k].length > 0) || !Array.isArray(obj[k])) ) {
     result[k] = obj[k];
   }
  });

  return result;
}


db.collection.find().map(returnStripped)

Which simply removes the unwanted keys from the result.

It would do the job, but the greater gain here is from actually fixing the schema and updating the data permanently.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • THANKS A LOT! I will implement and test it out! – Autorun Nov 07 '17 at 13:29
  • Thanks Neil. I used the "Manipulate Result" approach, then twist it for my use case. It works well! – Autorun Nov 07 '17 at 23:44
  • @Autorun You can do whatever you want. Note that the "long" description there does make the point that you are "still" both returning and "storing" a lot of unnecessary data by keeping the empty keys. Personally I vore with my "hip pocket" and choose to pay as little for data transfer as I possibly can. So whilst you "can" strip the data after it's returned, you "should" look at making your stored data conform to "only what you need". That's the point of the recommendations contained. – Neil Lunn Nov 07 '17 at 23:58
0

I think you can go with this.

thisModel.findOne({ extension: { $gt: [] } })
4b0
  • 21,981
  • 30
  • 95
  • 142
Srikar Jammi
  • 115
  • 8
  • Thanks Srikar. My find condition must be {'resource.id': resourceId, 'metadata.isActive': true, 'metadata.isDeleted': false}. Once I got the returned doc, I would like to return a subset of the document such as "resource.*" – Autorun Nov 07 '17 at 05:29
  • Can you please go through this once?https://stackoverflow.com/questions/14789684/find-mongodb-records-where-array-field-is-not-empty – Srikar Jammi Nov 07 '17 at 05:41