1

Context:

I'm currently working on a data pipeline project where the aim is to sync MongoDB documents into a SQL server. However, the schema and data are on separate documents in the form shown below:

Schema Document:

{
    schema_id: someHex,
    key_map:\[
        0:  {
            key: "a",
            column_name: "colName1",
            type: "Int"
        },
        1:  {
            key: "b",
            column_name: "colName2",
            type: "String"
        },
        2:  {
            key: "c",
            column_name: "colName3",
            type: "Boolean"
        }
    ]
}

Data Document:

{
    data_id: someHex,
    schema_id: someHex,
    data: [
        0: {
            key: "a",
            value: 252
        },
        1: {
            key: "b",
            value: "some_string"
        },
        2: {
            key: "c",
            value: True
        }
    ]
}

The destination table should look like this:

schema_id colName1 colName2 colName3
someHex 252 some_string 1

Question:

The issue lies in finding a convenient/maintainable way to sync such data using tools like Fivetran/Airbytes to help handle error management and the like. Been researching various open-source tools and have found none that offer such capabilities so help will be very much appreciated.

I have currently implemented the sync which goes like this:

MongoDB -> Airbytes -> S3 -> Lambda -> SQL

Sync mongo data to S3 using Airbytes, an AWS lambda function then transforms/normalizes the Json files into a columnar form which then gets pushed to a SQL server

NeroM1
  • 11
  • 3

0 Answers0