0

I have a table like the one below in SQL Server:

enter image description here

I have an API that receives 15,000 records every 10 seconds to save in the table above. As if userid exist update that row if not exist userid insert record. I use the following code to write the record of each user with pyodbc (in python) That means I run the following code 15,000 times :

update Mytable 
set buy = ?, model = ?, price = ?, color = ?, number = ?,
    balance = ?, time = ?, type = ?, 
where userid = ?

if @@ROWCOUNT = 0
    insert into Mytable (userid, buy, model, price, color, 
                         number, balance, time, type)
    values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

The above code works fine but takes 700 seconds for 15,000 records. I have just this API and I have no other information about the number of users and ...

How can I save 15,000 records in less than seven seconds?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
henrry
  • 486
  • 6
  • 25
  • related: https://stackoverflow.com/q/62388767/2144390 – Gord Thompson Sep 10 '21 at 20:15
  • @GordThompson My problem is being slow, not storing data – henrry Sep 10 '21 at 20:21
  • It looks like you are processing one row at a time which is bound to be slow. It will be much faster to upload the rows in bulk and then merge them on the server. – Gord Thompson Sep 10 '21 at 20:26
  • 1
    Yep, batch them up. You need a buffer that releases messages to SQL as a set. You will probably want to set two release conditions - either buffer full, or timer elapsed, whichever comes first. – allmhuran Sep 10 '21 at 20:28

2 Answers2

3

How can I save 15,000 records in less than seven seconds?

The most important things (in order of importance) are:

  1. to not to send 15,000 separate batches or RPCs to SQL Server

  2. to not run 15,000 separate transactions

and

  1. to not run 15,000 separate DML statements.

The easiest way to do this from Python is to send the data as a JSON document and insert/update the data using batch update/insert, or merge.

You can send all 15,000 or break it into a few batches. See eg, Trying to insert pandas dataframe to temporary table

Once you have the data on the server you can use TSQL's MERGE, or a single UPDATE and single INSERT statement for the whole batch.

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • Thanks, Can you give me TSQL code for insert and same time update if exist ? – henrry Sep 11 '21 at 15:35
  • Check the link for MERGE. – David Browne - Microsoft Sep 11 '21 at 15:36
  • Does it make sense step 1 insert the data as a JSON and then delete duplicate old records? – henrry Sep 11 '21 at 15:50
  • 1
    TIL: I just tested the JSON bulk upload against a plain old `df.to_sql()` with `fast_executemany=True`. I expected the JSON approach to be a bit slower because of the extra overhead with JSON repeating the column names for each row and representing all values as strings, but it turned out that `.to_sql()` actually took almost 40% longer to upload 5 columns x 1_000_000 rows. I guess the time pyodbc takes to build the parameter array is considerably longer than the time saved in pushing the packed data over the wire. (YMMV, of course.) – Gord Thompson Sep 11 '21 at 16:17
  • 1
    Okay, so the JSON upload did generate more network traffic: 234 MiB vs. 164 MiB for `.to_sql()`. Still, the JSON approach is consistently faster, Go figure. – Gord Thompson Sep 11 '21 at 17:26
  • Yep. Pandas with fast_executemany doesn’t use the bulk loading APIs for SQL Server, just regular parameterized statements. And it’s the round trips more than the number on bytes that cause trouble. – David Browne - Microsoft Sep 11 '21 at 18:40
0

Thanks from @David Browne and This tutorial for use OPENJSON in SQL server. I use OPENJSON and Write 15000 rows just in 1 second with below code (in pyodbc) and then delete duplicate old records with 1 query.

Write 1500 rows:

import pyodbc 
conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=server_name;'
                      'Database=database_name;'
                      'Trusted_Connection=yes;')

cursor = conn.cursor()
 SqlSave=   DECLARE @json NVARCHAR(max) = ' [ 
          
                { "userid" : 14F2G34, "buyV":"pen" ,"modelM" : "Bic", "color" : "red","numberB" : 4000,"balanceVal" : 750,"timeBuy" : 1631303488,"type":"simple" },
                { "userid" : 14F2G35, "buyV":"pen" ,"modelM" : "blueBic", "color" : "blue","numberB" : 1000,"balanceVal" : 150,"timeBuy" : 1631303488,"type":"coly" },
                { "userid" : 14F2G36, "buyV":"pen" ,"modelM" : "oli", "color" : "yellow","numberB" : 6000,"balanceVal" : 200,"timeBuy" : 1631303488,"type":"ni" },
            
      ...
    
    
        ]';
    
    INSERT INTO Mytable 
    SELECT * 
    FROM OPENJSON(@json, '$')
    WITH  (
            useruserid      int   '$.userid',  
            buy   varchar(60)     '$.buyV', 
            model   varchar(60)     '$.modelM', 
            color   varchar(60)     '$.color', 
            number   varchar(60)     '$.numberB', 
            balance   varchar(60)     '$.balanceVal', 
            time   varchar(60)     '$.timeBuy', 
            type   varchar(60)     '$.type', 
            
            
         
        );
cursor.execute(SqlSave)
cursor.commit()

Note: if length your string above is more than 4000 use NVARCHAR(max).

For update exist rows I delete old rows, I have ID column that auto incremental and I delete all duplicate rows by userid column except duplicate rows which have max ID by below code:

delete FROM mytable
WHERE ID NOT IN
(   SELECT max(ID)
FROM mytable
GROUP BY userid
)
henrry
  • 486
  • 6
  • 25