2

I have an API which receives a large JSON object (~7MBs at least), this JSON consists of some nested objects as follows:

[
  {
    "CategoryId": "P1",
    "CategoryName": "Pizza",
    "Products": [
      {
        "ProductId": "PROD700",
        "ProductName": "Pepperoni Feast Large",
        "ProductPrice": "5.5",
        "Choices": [
          {
            "CoiceId": "CH22",
            "ChoiceName": "Crust",
            "Extras": [
              {
                "ExtraId": "EX1",
                "ExtraName": "Classic Hand Tossed",
                "ExtraPrice": "1"
              },
              {
                "ExtraId": "EX2",
                "ExtraName": "Crunchy Thin Crust",
                "ExtraPrice": "1.25"
              }
            ]
          },
          {
            "CoiceId": "CH24",
            "ChoiceName": "Additionals",
            "Extras": [
              {
                "ExtraId": "EX3",
                "ExtraName": "Extra Black Olives",
                "ExtraPrice": "0.325"
              },
              {
                "ExtraId": "EX4",
                "ExtraName": "Extra Jalapeno",
                "ExtraPrice": "0.4"
              }
            ]
          }
        ]
      }
    ]
  }
]

This API will receive the JSON and save it in a queue, till another background service (i.e. console app or windows service) consumes the same API for read, and get a list of PENDING requests to be written in database.

In fact, this was a very sample object, but I just wanted to share the idea and structure of this object, and I have a monolithic database which has a huge traffic per second, so, I had the below options to go through:

  • Having some nested loops to save the above data one-by-one in database, which I consider it is too bad and it will hit the database performance due to the many round-trips, in addition, it will take too long time to get it done.
  • Using the previous scenario but with parallelism, so, we can have the same scenario with something like Parallel.For or so to reduce the execution time as much as I can, but we still have the issue of many database hits.
  • To overcome the above 2 challenges (execution time & multiple database hits), I thought about using staging tables concept along with SqlBulkCopy, so, we can have some staging tables in main db or tempdb, then after insertion/bulk copy I can have a stored procedure which has MERGE statement which will insert the data from those staging tables to the main tables in database. The main challenge here if the console app is processing more than one request/object at the same time, it will be an issue as the staging table will be locked during the SqlBulkCopy, in addition, removing the indexes from that staging table will be better during the copying process to speed it up as much as possible, but before MERGE we should have indexes to speed up the process of reading from those staging tables. The challenge comes up here with the CREATE & DROP indexes, it is too hard and not recommended, especially in 2 cases: (1) staging table has big amount of data. (2) what if I started creating indexes as a preparation for MERGE, but meanwhile, there is another SqlBulkCopy in in progress in parallel for another request.

What is the question? it is about app architecture ... Simply, I wanna perform the writing process in database without hitting database performance, and without consuming a lot of resources of both app & db servers. In addition, I wanna resolve the challenges which I have mentioned above in suggested scenarios like the execution time & many round-trips on db & locking tables while SqlBulkCopy in case of having concurrent requests.

I am just sharing my thoughts about that, but I am totally opened to hear from you if you have a better idea/implementation for this scenario.

NOTES:

  • I am using ADO .NET along with stored procedures to speed up the whole process.
  • Each request should be written into the database within 5-10 minutes at max since its creation/posting time.
  • Sometimes, we can have multiple requests which should be written on parallel, and the sequential processing will not be good from business perspective.
Ahmed Negm
  • 865
  • 1
  • 11
  • 30
  • 1
    You have done a good enough job of describing your scenario, but I don't think there is a single question contained within it (certainly no question marks). You may elicit a better response if you break this down into more specific questions, otherwise there is a good chance this will be closed as too broad. Also, what version of SQL Server are you using? – GarethD Mar 07 '19 at 09:08
  • Will edit it now with more clarification ... – Ahmed Negm Mar 07 '19 at 09:10
  • @GarethD I just edited the question – Ahmed Negm Mar 07 '19 at 09:18
  • Have you considered using [SQL Server Service Broker](https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2005/administrator/ms345108(v=sql.90)) – GarethD Mar 07 '19 at 09:37
  • The issue is not the queuing part, it is totally in processing part, even for one single request, and if I have multiple requests, I think all should be done parallel but the consequences of this parallelism should be considered. – Ahmed Negm Mar 07 '19 at 09:40
  • 1
    If it is just the processing, I would be inclined to process the JSON outside of your database and parse it into tables that are similar to your database structure, then save the data into your database by passing these datatables to stored procedures using [table-valued parameters.](https://learn.microsoft.com/en-us/sql/relational-databases/tables/use-table-valued-parameters-database-engine?view=sql-server-2017) – GarethD Mar 07 '19 at 09:46
  • It might be a good idea, but we have to consider the auto-generated IDs which should be used to like the components together. I mean if I did that with `Categories` I should have each category identity, to like the underneath products with it ... etc. – Ahmed Negm Mar 07 '19 at 09:50
  • 1
    Capturing autogenerated IDs is fairly trivial if you use `OUTPUT`. If you use [`OUTPUT` along with `MERGE`](https://stackoverflow.com/a/10950418/1048425) this will allow you to capture rows from both the source and the destination, and can help avoid duplicating work (e.g. inserting in one statement, then looking up against the table to get the ID in another statement) – GarethD Mar 07 '19 at 10:42
  • Let me give it a try – Ahmed Negm Mar 07 '19 at 10:45
  • What version of SQL Server? You can work with JSON directly in SQL Server 2016. Then use an INSERT....SELECT * FROM OPENJSON(@json) – Steve Ford Mar 07 '19 at 14:17
  • @SteveFord It's 2012, so fetching from JSON will not work here :( – Ahmed Negm Mar 07 '19 at 14:57

0 Answers0