1

I have a table that looks something like this:

CREATE TABLE Records 
(
    ID UNIQUEIDENTIFIER PRIMARY KEY NONCLUSTERED,
    owner UNIQUEIDENTIFIER,
    value FLOAT,
    timestamp DATETIME
)

There is a multi-column clustered index on some other columns not relevant to this question.

The table currently has about 500,000,000 rows, and I need to operate on the table but it's too large to deal with currently (I am hampered by slow hardware). So I decided to work on it in chunks.

But if I say

SELECT ID 
FROM records
WHERE ID LIKE '0000%'

The execution plan shows that the ENTIRE TABLE is scanned. I thought that with an index, only those rows that matched the original condition would be scanned until SQL reached the '0001' records. With the % in front, I could clearly see why it would scan the whole table. But with the % at the end, it shouldn't have to scan the whole table.

I am guessing this works different with GUIDs rather than CHAR or VARCHAR columns.

So my question is this: how can I search for a subsection of GUIDs without having to scan the whole table?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Robert Sievers
  • 1,277
  • 10
  • 15
  • if possible you can test this by adding a char column that has id casted to char and create and index for this column, then check the query – AlexYes Jul 03 '19 at 20:01
  • 6
    A GUID is not a string, and the bytes of a GUID [are not stored as they are printed](https://stackoverflow.com/q/7810602/4137916). This query requires a conversion of all individual rows, which is of course none too efficient. It *should* be possible to chunk GUID processing by constructing the appropriate forms and using `>=` and `<` to compare (I don't think `LIKE` patterns will work) but it's not entirely trivial. You might try whether chunking on a `ROW_NUMBER()` is fast enough; it'll certainly be much simpler. – Jeroen Mostert Jul 03 '19 at 20:03
  • @jeroen Mostert. That is intriguing. Because it doesn't matter how I chunk it, just that I do. If the < and > affect the 5th group left to right, I could easily construct the boundaries dynamically. I will test that out. – Robert Sievers Jul 03 '19 at 20:12
  • 1
    One thing that will certainly work for anything that *has* an order (and `UNIQUEIDENTIFIER` does) is simply to ask for the `TOP(x) ORDER BY ...`, keep track of the last row you saw, and issue the next ones with `WHERE ... > lastvalue`. This way it doesn't matter how SQL Server sorts GUIDs, as long as you keep track of where you left off. – Jeroen Mostert Jul 03 '19 at 20:14
  • Having said that, unless your columns are filled by `NEWSEQUENTIALID()` (and even then, since it's not perfectly monotonically increasing) adding new rows will cause your enumeration to break and skip rows, unlike for an `IDENTITY`, which is kind enough only to increase under normal circumstances even when it skips values. So be mindful of any inserts! – Jeroen Mostert Jul 03 '19 at 20:27

1 Answers1

2

From your comments, I see the actual need is to break the rows of random GUID values into chunks (ordered) based on range. In this case, you can specify a range instead of LIKE along with a filter on the desired start/end values in the last group:

SELECT ID
FROM dbo.records
WHERE
    ID BETWEEN '00000000-0000-0000-0000-000000000000'
    AND '00000000-0000-0000-0000-000FFFFFFFFF';

This article explains how uniqueidentifiers (GUIDs) are stored and ordered in SQL Server, comparing and sorting the last group first rather than left-to-right as you might expect. By filtering on the last group, you'll get a sargable expression and touch only those rows in the specified range (assuming an index on ID is used).

Dan Guzman
  • 43,250
  • 3
  • 46
  • 71
  • That works, but with the byte order taken into account. I can break the table into 4096 roughly equal chunks as follows: SELECT ID FROM Records WHERE ID BETWEEN '00000000-0000-0000-0000-000000000000' AND '00000000-0000-0000-0000-000FFFFFFFFF' Then change the values of the 000 in the start of the 5th byte group. If you want to slightly edit your answer, I will mark this closed. Thank you. – Robert Sievers Jul 03 '19 at 20:30
  • It might be worth it for you to read this article on how GUIDs can be, and are, sorted differently in different contexts: https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=102450 – pmbAustin Jul 03 '19 at 21:31
  • @RobertSievers, I improved my answer referencing that link and your specific use case. Here's [another article](https://www.dbdelta.com/improving-uniqueidentifier-performance/) detailing byte-swapping for SQL Server ordering for generating sequential GUIDs for SQL Server. – Dan Guzman Jul 03 '19 at 23:20