0

So sum it up quickly I have a sqlServer database (using SSMS to work with it) and it has a table with order_num column and then a description column. e.g 16548221587 | Small hairbrush. I have index on order_num column.

I then have a VB.net app which basically i want it to allow the user to put a .txt file with a massive list of order_nums (>150,000, 1 per line) and what it does is reads these line by line and then search's the database, adds it all to a temp table then streamwrites it to a "Results" .txt file.

In regards to the title of this question i ask it because my code which i'll post below, works! and i'm clocking it at reading and find and inserting each find into the temp table at .427 seconds a search but put this with 150,000 records looking at it taking over 16 hours! So that's what i'm wondering am i doing this a topsy turvy way or am i expecting way too much of reading/finding and retrieving that many records and expect it to go quicker?

If System.IO.File.Exists(TextBox2.Text) Then
                    'read in file list of order numbers to search
                    result = From n In System.IO.File.ReadLines(TextBox2.Text)
                             Where n.Length = 13
                             Select n.Substring(0, 13)
                Else
                    MessageBox.Show("The file path you entered seems to be invalid. Please try again.")
                    Exit Sub
                End If



                For Each word As String In result

                    Dim cmd As New SqlCommand("dbo.OrdersToTemp", con)
                    cmd.CommandType = CommandType.StoredProcedure
                    cmd.Parameters.Add("@OrderNum", SqlDbType.NVarChar).Value = word.ToString()
                    cmd.CommandTimeout = 3000
                    cmd.ExecuteNonQuery()

                Next




                Using sw As New StreamWriter(TextBox2.Text.Substring(0, TextBox2.TextLength - 4) + "-results.txt")


                    Dim retrieveResults As New SqlCommand("dbo.GetResults", con)
                    retrieveResults.CommandType = CommandType.StoredProcedure
                    retrieveResults.CommandTimeout = 3000

                    Using RDR = retrieveResults.ExecuteReader

                        Do While RDR.Read



                            OrderDescription = RDR("Description").ToString()

                            sw.WriteLine(OrderDescription )

                        Loop

                    End Using




                End Using

UPDATE

I have taken some the advice on this and now i sqlbulkcopy the order_nums that need to be searched into a Temp table, this is done quite fast. I am then using a query such as

SELECT o.order_num, description
from OrderTable o
join TempOrderIdTable t on t.order_num = o.order_num

But it still seems quite slow getting even just 170 results takes like 30 seconds which in my eyes is quite slow. I put a clustered index on the order_num in the order_table and NO index on the temp table which is basically just the .txt file except in sql table

UPDATE 2

So like i said i now have a non-clustered index (orderNo include description) on OrderTable and a clusterd index(Order_num) on the TempTable BUT any sort of join or cross apply etc. takes still over 33 seconds to basically join 100 OrderNum and return just 170 which is still so slow. here are the joins I am trying:

select o.Order_num, t.description
from Temp_data o
join OrderTable on t.Order_num= o.Order_num


select x.Order_num, x.description
from OrderTable x
where Order_num in (select Order_num from Temp_data)


select x.Order_num,x.description
from OrderTable x
cross apply (select o.Order_num from Temp_data o where o.Order_num= x.Order_num) ord

SOLVED So it was me being an idiot for the final bit and you were all correct basically when i was making the temp table I accidentally made the Column a nvarchar whereas in the actual OrderTable it was just a varchar column for order_num. Sorry about that guys me being half asleep!

K.Madden
  • 353
  • 1
  • 16

4 Answers4

3

The issue with your code is that you execute this SQL command 150k times. That's never going to work (fast).

What you can do is read the 150k values from the file first, then insert using SqlBulkCopy into a table, for example as shown in this SO answer. Basically do what your dbo.OrdersToTemp procedure does, in your vb.net code - all at once instead of one row at a time. This should take a couple of seconds at most, given latency for your current query.

For the following query:

SELECT o.order_num, description
from OrderTable o
join TempOrderIdTable t on t.order_num = o.order_num

I'm assuming OrderTable can contain multiple records per order_num (you mentioned returning 170 rows for 100 orders), you could use indexes as such:

CREATE INDEX ix ON OrderTable (order_num) INCLUDE (description)

If the order numbers you have in the file are unique (and you can ensure uniqueness):

CREATE UNIQUE CLUSTERED INDEX ux ON TempOrderIdTable (order_num);

If your SQL Server edition supports it, you could compress the index with WITH (DATA_COMPRESSION = PAGE), but this requires an Enterprise license (or Developer, but you can't use that in prod environment).

MarcinJ
  • 3,471
  • 2
  • 14
  • 18
  • That's indeed quite very slow, how big is that `OrderTable`?! See my updated answer for a suggestion on indexing. – MarcinJ Apr 11 '19 at 14:49
  • Its about 10 million rows! – K.Madden Apr 11 '19 at 17:52
  • @MarcinJ The indexing on the temp table should do the trick and SqlBulkCopy is better than sharing a file. May I suggest that you add the join query to the answer to complete it and the OP to accept your answer? – J.R. Apr 11 '19 at 19:42
  • Hi @MarcinJ! So i did exactly as you said, added the exact indexes and all but running and getting the query results still takes 35 seconds(for 100 tickets, 170 results)! I do have a feeling and it is something i forgot to mention(apologies) is the descriptions are `nvarchar(max)` as they can be quite long sometimes so maybe that has a affect on it? – K.Madden Apr 15 '19 at 08:10
1

I do not know the structure of the stored procedures that you use in the command, but I guess that most of the time the operations on the SQL server and the communication between the application and the SQL server take up most of the time. So think about optimalization there - like send all the numbers at once to procedure. You won't have to call the procedure xy times. And attempt to look at the execution plan of stored procedures - is there something which can be improved?

1

I'd suggest the following optimisations.

Firstly, Bulk insert into the temporary order id table directly from the txt file. If you have one order_num in each line then this will do.

BULK INSERT TempOrderIdTable
FROM 'C:\data\orderids.txt'
WITH (FIRSTROW = 1,
    FIELDTERMINATOR = ',',
    ROWTERMINATOR='\n' );

For the bulk insert to work the file must be accessible for SQL server, either on the SQL server machine or via a shared location on the network.

Secondly, run a query to get all descriptions in one go by joining onto the temporary order id table:

SELECT o.order_num, description
from OrderTable o
join TempOrderIdTable t on t.order_num = o.order_num

Now you have the results in two steps.

J.R.
  • 1,880
  • 8
  • 16
  • 1
    This will only work if they can get the file to the server where SQL Server is running though. – MarcinJ Apr 11 '19 at 11:57
  • @Marcin Or it needs to be accessible on a network path by the SQL server machine. Good point; let's see what OP says. – J.R. Apr 11 '19 at 12:08
  • Hi guys, so i did my own version of sqlbulkcopy using a datatable and all which works instant or at least very very fast, i then am using the whole join query but even to do 100 order_numbers(return 170) still takes about 30 seconds which is quite slow for a join query, any ideas? – K.Madden Apr 11 '19 at 13:38
1

Just for future reference (bulk copy and indexes are most certainly the answer)... Don't create a command for each iteration of a loop, adding parameters on each iteration and setting properties of the command. They are the same for each loop; only the value of the parameters change.

    Dim cmd As New SqlCommand("dbo.OrdersToTemp", con)
    cmd.CommandType = CommandType.StoredProcedure
    cmd.Parameters.Add("@OrderNum", SqlDbType.NVarChar)
    cmd.CommandTimeout = 3000
    For Each word As String In Result
        cmd.Parameters("@OrderNum").Value = word
        cmd.ExecuteNonQuery()
    Next
Mary
  • 14,926
  • 3
  • 18
  • 27