I have an API which receives a large JSON object (~7MBs at least), this JSON consists of some nested objects as follows:
[
{
"CategoryId": "P1",
"CategoryName": "Pizza",
"Products": [
{
"ProductId": "PROD700",
"ProductName": "Pepperoni Feast Large",
"ProductPrice": "5.5",
"Choices": [
{
"CoiceId": "CH22",
"ChoiceName": "Crust",
"Extras": [
{
"ExtraId": "EX1",
"ExtraName": "Classic Hand Tossed",
"ExtraPrice": "1"
},
{
"ExtraId": "EX2",
"ExtraName": "Crunchy Thin Crust",
"ExtraPrice": "1.25"
}
]
},
{
"CoiceId": "CH24",
"ChoiceName": "Additionals",
"Extras": [
{
"ExtraId": "EX3",
"ExtraName": "Extra Black Olives",
"ExtraPrice": "0.325"
},
{
"ExtraId": "EX4",
"ExtraName": "Extra Jalapeno",
"ExtraPrice": "0.4"
}
]
}
]
}
]
}
]
This API will receive the JSON and save it in a queue, till another background service (i.e. console app or windows service) consumes the same API for read, and get a list of PENDING requests to be written in database.
In fact, this was a very sample object, but I just wanted to share the idea and structure of this object, and I have a monolithic database which has a huge traffic per second, so, I had the below options to go through:
- Having some nested loops to save the above data one-by-one in database, which I consider it is too bad and it will hit the database performance due to the many round-trips, in addition, it will take too long time to get it done.
- Using the previous scenario but with parallelism, so, we can have the same scenario with something like
Parallel.For
or so to reduce the execution time as much as I can, but we still have the issue of many database hits. - To overcome the above 2 challenges (execution time & multiple database hits), I thought about using staging tables concept along with
SqlBulkCopy
, so, we can have some staging tables in main db ortempdb
, then after insertion/bulk copy I can have a stored procedure which hasMERGE
statement which will insert the data from those staging tables to the main tables in database. The main challenge here if the console app is processing more than one request/object at the same time, it will be an issue as the staging table will be locked during theSqlBulkCopy
, in addition, removing the indexes from that staging table will be better during the copying process to speed it up as much as possible, but beforeMERGE
we should have indexes to speed up the process of reading from those staging tables. The challenge comes up here with theCREATE
&DROP
indexes, it is too hard and not recommended, especially in 2 cases: (1) staging table has big amount of data. (2) what if I started creating indexes as a preparation forMERGE
, but meanwhile, there is anotherSqlBulkCopy
in in progress in parallel for another request.
What is the question? it is about app architecture ... Simply, I wanna perform the writing process in database without hitting database performance, and without consuming a lot of resources of both app & db servers. In addition, I wanna resolve the challenges which I have mentioned above in suggested scenarios like the execution time & many round-trips on db & locking tables while SqlBulkCopy in case of having concurrent requests.
I am just sharing my thoughts about that, but I am totally opened to hear from you if you have a better idea/implementation for this scenario.
NOTES:
- I am using
ADO .NET
along with stored procedures to speed up the whole process. - Each request should be written into the database within 5-10 minutes at max since its creation/posting time.
- Sometimes, we can have multiple requests which should be written on parallel, and the sequential processing will not be good from business perspective.