0

Background:

I have a source MongoDB database containing transactional data in documents and these have the standard "_id" field as an index.

I also have an SQL SSRS Reporting database for this same transactional data that I create data extracts from.

Part of the verification for an extract to get created, is a count validation of the number of records between the source and Reporting database. This is currently failing saying that I have one extra record in my Reporting database for a certain customer/date range.

The reporting database also has the same "_id" field as a Primary Key in its tables named "TransactionId".

Both the MongoDB and SQL Reporting databases also have the concept of a CreationTimestamp, which is common across both databases at a transaction level.

Question:

Using database queries, how can I find the extra record in SQL that isn't in my source MongoDB collection?

Also, it would be handy to be able to do this the other way around (in MongoDB but not in SQL), which would probably be more common.

Mike
  • 827
  • 11
  • 27

1 Answers1

0

I didn't get an answer here so worked on figuring this out myself. I will post the solution here but I'd be very happy to see any alternative/better solutions as this is a bit clunky.

Find and sort MongoDB UUIDs between 2 dates

// This should match the “Source count” in my logs
db.getCollection('My_Mongo_Collection')
    .find({"CreationTimestamp" : {$gte : ISODate('2020-02-18T00:00:00.000Z'), $lte : ISODate('2020-02-27T23:59:59.999Z')}})
    .count()

// This will find and sort the “_id” UUIDs of the documents within the date range
var cursor = db.getCollection('My_Mongo_Collection')
    .find({"CreationTimestamp" : {$gte : ISODate('2020-02-18T00:00:00.000Z'), $lte : ISODate('2020-02-27T23:59:59.999Z')}}, {"_id": 1})
    .sort({"CreationTimestamp" : -1})

// This will format the resulting UUIDs into a format that SQL expects for insertion into a temporary table
cursor.forEach(function(user){
    var str = user._id.valueOf()+",";
    print(
        str.replace("UUID(\"", "('")
          .replace("\"),", "'),"));
});

Find extra/missing UUID in SQL

--DROP TABLE #temp

-- This will create a temporary table in memory
CREATE TABLE #temp (
id VARCHAR(50))

-- Insert the formatted UUIDs from the Mongo script query. This will create a temporary 
-- table in memory
INSERT INTO #temp
VALUES 
('238832d1-d0ab-4dc3-80cb-ab6d00b811ae'),
('00ceef1e-1b9e-4f1f-a8d8-ab6d00b811ae'),
...

-- When inserting more than 1000 records the following error will be seen. 
-- Msg 10738, Level 15, State 1, Line 2011
-- The number of row value expressions in the INSERT statement exceeds the maximum allowed 
-- number of 1000 row values.

-- Click on the error and continue like so, deleting the comma above the error and creating 
-- a second insert statement for the second 1000 records, run the query again to get the 
-- next 1000 error and keep repeating this cycle until you get to the last record:
('7c505e5e-41b4-4e40-9044-ab6d00b811ae'),
('5f197206-b218-4785-9f0f-ab6d00b811ae')

INSERT INTO #temp
VALUES 
('a9d174e9-a30a-42b5-8815-ab6d00b811ae'),
('72c728e6-0c57-4109-89ca-ab6d00b811ae')
. . .

-- Delete the trailing comma off the last record like so and run the CREATE and INSERT 
-- statements.
('672af0f6-d643-4101-acb3-ab6500fc539c'),
('78bf2c9f-20b4-4fa2-8c06-ab6500fc539c')

Find extra in MongoDB but not in SQL

-- Find extra records in Mongo that are not in SQL
SELECT * FROM #temp 
WHERE id NOT IN 
       (SELECT TransactionId FROM My_Reporting_Table)

Find extra in SQL but not in MongoDB

-- Find extra records in SQL that are not in Mongo
SELECT * FROM My_Reporting_Table
WHERE TransactionId NOT IN
       (SELECT id FROM #temp) AND CreationTimestamp BETWEEN '2020-02-19 00:00:00.000' AND '2020-02-28 23:59:59.999'

NOTE: Mongo is case sensitive and uses lowercase UUIDs only, SQL is not case sensitive and can use lowercase or uppercase UUIDs.

There is no difference between UUIDs and GUIDs Is there any difference between a GUID and a UUID?

Mike
  • 827
  • 11
  • 27