2

I have a table, let's call it Foo, there is a field called Bar in the table, also, there is a field called LoremIpsum.

Bar is an nvarchar but I'm only interested in Bar values which are convertable to an integer number.

LoremIpsum is also a number.

I want to get the first n (value) numbers beginning with k from my Foo table where LoremIpsum has a specific value (LoremIpsum1) and (value) not in select Bar from Foo where LoremIpsum = LoremIpsum1

My input is: (LoremIpsum1, n, k)

where LoremIpsum1 is a particular value for LoremIpsum, n is the number of numbers to get and k is the offset.

My output should be a set of numbers which don't exist in the table as Bar values if LoremIpsum = LoremIpsum1. My current command:

WITH    q AS
        (
        SELECT convert(bigint, 50000) AS num
        UNION ALL
        SELECT  convert(bigint, num + 1)
        FROM    q
        )
SELECT  top 200 *
FROM    q
where convert(bigint, num) not in (select convert(bigint, Bar) from Foo where LoremIpsum = 68 and isnumeric(Bar + 'e0') = 1)
option (maxrecursion 365);

This query doesn't always work, at a computer the problem was that an nvarchar couldn't be converted to a bigint, probably it was a problem with the settings, the other error occurs if the needed recursion is deeper than the given maxrecursion.

In this case n = 200, k = 50000, LoremIpsum1 = 68

The main problem is that this query is recursive and I would like to modify it to be an iterative query.

Any help is greatly appreciated.

Best regards, Lajos Arpad.

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
  • 1
    Your `SELECT` should be `case when isnumeric(Bar + 'e0') = 1 then convert(bigint, Bar) end` [You cannot rely on the `WHERE` happening before the conversion.](http://stackoverflow.com/questions/5195094/sql-server-predicates-lazy/5195853#5195853) RE: Your recursion issue you could just create a permanent table of numbers or use the cross joined CTE approach [from here](http://stackoverflow.com/questions/10819/sql-auxiliary-table-of-numbers/2663232#2663232) – Martin Smith Apr 12 '11 at 11:06
  • Thank you Martin, but this raises performance issues. Is there an optimization? – Lajos Arpad Apr 12 '11 at 11:15
  • "RE: Your recursion issue you could just create a permanent table of numbers or use the cross joined CTE approach " That is also the solution suggested by Mikael Eriksson, but it raises performance issues and maintenance problems – Lajos Arpad Apr 12 '11 at 11:28
  • @Lajos - The recursive CTE is the **slowest** way of generating such a table so performance should be improved. The cross joined CTE is pretty fast at generating numbers but sometimes I've found when you try and use it in `JOIN`s etc. performance can bomb. – Martin Smith Apr 12 '11 at 11:40
  • Yes, I absolutely agree. In fact this is the main reason I'm asking the question. – Lajos Arpad Apr 12 '11 at 11:44
  • "The cross joined CTE is pretty fast at generating numbers but sometimes I've found when you try and use it in JOINs etc. performance can bomb" Yes, I'm afraid of this too. – Lajos Arpad Apr 12 '11 at 11:50
  • @Lajos - Have you read the answer that I linked to above on numbers tables? – Martin Smith Apr 12 '11 at 11:52
  • Yes, I did. I wanted to change the code to not use recursion. That's why I've asked whether there is an iterative solution for the problem. Both you and Mikael Eriksson suggested the usage of a number table and that's a solution. But... Is it the best? Is there absolutely no possibility to select the first n nonexistent Bar numeric values from a Foo tables without using recursion or using a huge table? If that's the case I'll have to settle with the numeric table solution. So, in fact I'm not arguing whether the number table is better than recursion, I just think there might be a better way. – Lajos Arpad Apr 12 '11 at 12:01
  • Logically this is a very simple concept. We have a set and we don't want to select its rows, we want to select values which are not in the set. I'm sure that this type of problem occured before and I hope there is a feature to solve this issue instead of using workarounds, like recursion or number tables. Both of these solutions are "good", using a number table is better than using recursion, but they are not elegant solutions, we are limited by the maximum recursion deepness and performance issues in the first and by storing a lot of data and a smaller performance issue in the second. – Lajos Arpad Apr 12 '11 at 12:07
  • @Lajos - Well writing an iterative solution is trivial isn't it? You just need to declare an `@int` variable and increment it in a loop (saving the results in a table variable). I fully expect performance to be much worse than a set based solution but only one way to find out! – Martin Smith Apr 12 '11 at 12:07
  • No, no, no... I don't want to use loops. Using explicit iterative solution is even worse than recursion, I agree. But I was talking in implicit iterative solution, using a single query and letting the RDBMS to do the iterative work. For examle select * from Foo where something is implicitly iterative, because MS SQL will search the rows based on "something" using its optimizations, like indexes and data structural optimizations. I'm thinking of something like select myvariable where myvariable is not in select Bar from Foo – Lajos Arpad Apr 12 '11 at 12:13
  • There are no additional options over and above the ones already discussed AFAIK (except maybe a CLR TVF) – Martin Smith Apr 12 '11 at 12:16
  • But I don't know if that's possible. If I will know for sure there is no better solution than creating a table to store millions of numbers I'll accept Mikael Eriksson's solution and implement it. But currently I don't know whether it's the best and I hope there is something even better. Of course, I'm not thinking about explicit iterative loops. – Lajos Arpad Apr 12 '11 at 12:16
  • Well, thank you Martin for your patience and your comments, I'll google for some solutions and will let you know if I find something better. Thank you again. – Lajos Arpad Apr 12 '11 at 12:17
  • 1
    @Lajos – The fastest could be a combo of iterative loop and number table. Use a number table with a fair amount of rows and write an iterator that iterates over the solution I provided moving the offset by the count of the number table. I have no idea if that would be faster or not but it is sure worth testing. You could then test with different sizes of the number table to find the "sweet spot". – Mikael Eriksson Apr 12 '11 at 12:23
  • @Lajos - How many rows per `LoremIpsum` do you have? And how many rows per `LoremIpsum,Bar` combination on average? I assume that `Bar` has no useful index on it as it seems that it contains a mix of data types? – Martin Smith Apr 12 '11 at 12:24
  • Thank you Mikael Eriksson, that sounds like an idea worth to try – Lajos Arpad Apr 12 '11 at 12:26
  • number_of_rows(LoremIpsum) = infinity, number_of_rows(LoremIpsum, Bar) = 1. Unfortunately there is no useful index on Bar and there is truly a mess in its values in this moment. – Lajos Arpad Apr 12 '11 at 12:29
  • @Lajos - What does the plan look like? What `JOIN` type is being used for the `NOT IN` also does SQL Server add any spools? Can you create a computed column that does the cast to bigint if the numeric test passed and index that? – Martin Smith Apr 12 '11 at 12:41
  • 1
    `ALTER TABLE Foo ADD BarBigInt AS CASE WHEN ISNUMERIC(Bar + 'e0') = 1 THEN CONVERT(BIGINT, Bar) END` and `CREATE NONCLUSTERED INDEX ix ON Foo(LoremIpsum,BarBigInt)`. I was originally thinking a filtered index but [doesn't look like that is possible](http://connect.microsoft.com/SQLServer/feedback/details/518328/should-be-possible-to-create-a-filtered-index-on-a-deterministic-persisted-computed-column) – Martin Smith Apr 12 '11 at 13:54
  • Hello, I've chosen Mikael Eriksson's answer, because I've implemented an improved version of the idea. +1 for Martin's previous comment, because indexing is generally a good idea, but, we should not forget that indexing speeds up selection but slows down insertion and update. Indexing is not always a good idea, it always depends on the circumstances. In my case indexing is not a good idea, but in most cases an index can help a lot, so +1 to Martin. – Lajos Arpad Apr 14 '11 at 17:06

2 Answers2

2

You can use a permanent number table that has enough numbers for your need instead of the recursive cte.

create table NumberTable(Num int primary key)

You can use your recursive cte to populate the number table. Since it is just a one time thing the performance should not be that big deal and if it is you can have a look here.

WITH    q AS
        (
        SELECT 0 AS num
        UNION ALL
        SELECT  num + 1
        FROM    q
        )
insert into NumberTable(Num)        
select top 10000 num
from q
option (maxrecursion 0) 

Use the number table in your query with the offset.

declare @n int = 200
declare @k int = 50000

SELECT  top (@n) num+@k
FROM    NumberTable
--where convert(bigint, num+k) not in (select ...
Community
  • 1
  • 1
Mikael Eriksson
  • 136,425
  • 22
  • 210
  • 281
  • 1
    Thank you Mikael Eriksson, but the problem is that this number table will be huge, it will take a lot of time to repeatedly query millions of numbers from a number table. However, I'll think about your idea. Thank you very much. – Lajos Arpad Apr 12 '11 at 11:20
  • @Lajos – It needs to be as big as your largest possible value for n. I`m just curious, what would that be? – Mikael Eriksson Apr 12 '11 at 11:31
  • 1
    That's a very good question. Foo will be growing constantly, so, in fact there is no maximum value. In fact this task was invented to keep the values of Bar as small as possible, initially they were unique in Foo, but the values got so huge they became a user experience issue. – Lajos Arpad Apr 12 '11 at 11:35
  • 1
    Well, there is no better solution than this one, so I accept this solution as an answer. Although, there can be some improvements on it. However, the base of idea is what your described. Thank you again. – Lajos Arpad Apr 12 '11 at 15:25
  • @Lajos with "up to infinity" rows per `LoremIpsum` partition I really think a lack of index on `LoremIpsum, Bar` is where you should be concentrating your attention! – Martin Smith Apr 12 '11 at 16:17
  • 1
    In fact my solution is efficient in this moment, I've created a number table and I'm filling it in some events with a number of rows for each LoremIpsum and when they are used up they are deleted. I have a lot of inserts on the table and they will be slowed down by an index, however, selection will be quicker in that case. If optimization will be needed for the selections, probably an index will be added, but I don't want to make premature optimizations. Anyway, thanks guys, all of you gave good ideas. – Lajos Arpad Apr 14 '11 at 17:12
1

The following view will yield up to 2^32 numbers, and could be extended to give more if necessary:

create view numbers AS
with
  n as (select 0 as n union select 1)
, nn as (select 0 as n from n as n1, n as n2, n as n3, n as n4)
, nnn as (select 0 as n from nn as n1, nn as n2, nn as n3, nn as n4)
, nnnn as (select 0 as n from nnn as n1, nnn as n2)
select row_number() over (order by n) AS n from nnnn

go

select n from numbers where n <= 50000

Regrettably there is a sort stage in the query plan, so it starts to slow down noticeably as n gets really large. However, on my desktop machine the slowdown does not really become apparent until generating around 1,000,000 rows.

You may see better performance if you work the exhibited strategy directly into your full query (rather than as a standalone view) as the number of rows that need to be sorted would reduced to the minimum.

WReach
  • 18,098
  • 3
  • 49
  • 93
  • +1. This is in fact a nice idea, but for my (solved) problem it's not efficient, because 1,000,000 is not a big number in my case. Anyway, thank you for this idea, it may be helpful at other problems or for other persons. – Lajos Arpad Apr 14 '11 at 17:15