1

I have a list of documents I retrieve from a web API. All documents in this list have the same structure and 2 fields combined create a natural key.

I take this list and persist into a collection.

A month of so later I will call for a fresh subset of documents from the API based on a specifically value from one of the 2 fields. However, not all of the documents in the new subset include all the documents previously persisted.

I need to identify and remove old documents not in the fresh subset.

In SQL this is:

delete a from olderset a
left join newersubset b
 on a.f1 = b.f1
  and a.f2 = b.f2
where a.f2 is null
-- or something like that

Think of f1 as companyName and f2 as transactionID. olderset will contain a collection of different companyName/s.

But my newer API call is only getting the transactions of one specific company.

In mongoose, what is the best strategy to remove the company specific older transactions from the olderset collection. When the documents to be removed do not exists in the newersubset list?

Can you offer a code example?

Sample data:

[
  { "f1": "f1a", "f2": "f2a", "f3": "f3a" }
  , { "f1": "f1b", "f2": "f2b", "f3": "f3b" }
  , { "f1": "f1c", "f2": "f2c", "f3": "f3c" }
  , { "f1": "f1d", "f2": "f2d", "f3": "f3d" }
]

the second round:

[
  { "f1": "f1a", "f2": "f2a", "f3": "f3a" }
  , { "f1": "f1b", "f2": "f2b", "f3": "f3b" }
  , { "f1": "f1c", "f2": "f2c", "f3": "f3c" }
]
Steve
  • 905
  • 1
  • 8
  • 32
  • Could you give some sample data, and and sample code of what you have already tried? – Dan Green-Leipciger Apr 15 '17 at 20:57
  • 1
    @DanGreen-Leipciger I'm at a loss. I did add some example data as you request. I haven't yet tried anything because I don't know any strategy for doing this in mongoose. What would you do? I'd like to avoid manually iterating through each record on each side to find those not needing to be removed. The SQL query provides an excellent framework – Steve Apr 15 '17 at 21:29
  • It isn't LINQ. It's grep http://stackoverflow.com/questions/2963281/javascript-algorithm-to-find-elements-in-array-that-are-not-in-another-array – Steve Apr 15 '17 at 21:45
  • If you are trying to get rid of the old records, why not just remove the collection and re-add it? You could create a temporary collection, once it has successfully populated you can remove the old one and re-create it with the new data. – Dan Green-Leipciger Apr 15 '17 at 22:10
  • @DanGreen-Leipciger See Dan. That is the kind of suggestion I was looking for. That didn't occur to me. Would you think that risky at all for mongo/mongoose on say 1000 documents? Could it overload Mongo or take so long that it was a problem to other concurrent users? – Steve Apr 15 '17 at 22:13
  • I regularly copy 60K records from a production to a dev database and it usually takes about 60 seconds. 1K records shouldn't take much longer than a second (at most). Are you doing this with Mongoose (ORM) or Mongo via a shell or client like Mongo Chef? – Dan Green-Leipciger Apr 15 '17 at 22:17
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/141800/discussion-between-dan-green-leipciger-and-user2367083). – Dan Green-Leipciger Apr 15 '17 at 22:18

2 Answers2

1

If you a have set of documents that you would like to use to replace ALL of the documents in an existing collection, the best and safest way to do this is by using a temporary collection.

The following steps assume your collection is called foo

  1. Insert the new documents into a temporary collection called foo_temp

  2. Once all the records have been added (in a callback or a then) rename the original foo collection to foo_old

  3. Rename the foo_temp collection to foo

  4. Drop the collection foo_old

Notes:

  • In MongoDB, the new collection will be added automatically.
  • Performance should not be an issue, as you are only handling 1K records or so. Still, it wouldn't hurt to do overnight.
  • In the question it is noted that the IDs are specifically set and not auto-generated, if they were auto-generated the new ones would not match the old ones.

References:

Inserting multiple documents

Renaming a collection

Dropping a collection

Dan Green-Leipciger
  • 3,776
  • 1
  • 19
  • 29
  • I am also using the auto-generated ID. But that it changes is of no consequence. – Steve Apr 15 '17 at 23:23
  • Awesome. Glad I could help. – Dan Green-Leipciger Apr 15 '17 at 23:23
  • Thinking a little deeper into this. In SQL I will have a table with other attached named objects such as various indexes, triggers, .... In order to do what you suggest these objects have to be dropped before any renaming, and then re-created to support application integrity. I don;t think mongo has any such concerns, does it? – Steve Apr 15 '17 at 23:25
  • 1
    Nope, you should be able to rename at will. It's easy enough to test though. I suggest using Studio 3T (formerly mongo chef) you can try out all this stuff there using the mongo shell. All within the comfort of a GUI – Dan Green-Leipciger Apr 15 '17 at 23:28
0

As you say you could iterate down a subset of the older documents. Match each of those to the list of newer documents on the natural key. When you find an older document not in the newer list then delete it.

In LINQ this would be easy. Is that available to you?

I don't know how to do it in mongoose without iterating down one side or the other and/or without LINQ.

Steve
  • 905
  • 1
  • 8
  • 32