3

By experimentation and surprisingly, I have found out that LEFT JOINING a point-table is much faster on large tables then a simple assigning of a single value to a column. By a point-table I mean a table 1x1 (1 row and 1 column).

Approach 1. By a simple assigning value, I mean this (slower):

SELECT A.*, 'Value' as NewColumn,
FROM Table1 A

Approach 2. By left-joining a point-table, I mean this (faster):

WITH B AS (SELECT 'Value' as 'NewColumn')
SELECT * Table1 A
LEFT JOIN B
ON A.ID <> B.NewColumn

Join table to a point

Now the core of my question. Can someone advise me how to get rid of the whole ON clause:

ON A.ID <> B.NewColumn?

Checking the joining condition seems unnecessary waste of time because the key of table A must not equal the key of table B. It would throw out the rows from results if t1.ID had the same value as 'Value'. Removing that condition or maybe changing <> to = sign, seems further space to facilitate the join's performance.

Update February 23, 2015
Bounty question addressed to performance experts. Which of the approaches mentioned in my question and answers is the fastest.
Approach 1 Simple assigning value,
Approach 2 Left joining a point-table,
Approach 3 Cross joining a point-table (thanks to answer of Gordon Linoff)
Approach 4 Any other approach which may be suggested during the bounty period.
As I have measured empirically time of query execution in seconds of 3 approaches - the second approach with LEFT JOIN is the fastest. Then CROSS JOIN method, and then at last simple assigning value. Surprising as it is. Performance expert with a Solomon's sword is needed to confirm it or deny it.

Przemyslaw Remin
  • 6,276
  • 25
  • 113
  • 191
  • I would be very surprised if this was actually faster. The two queries have the same execution plan. But yeah Gordon is correct, you want a `CROSS JOIN`. – ChrisV Feb 11 '15 at 12:44
  • I tried 3 ways (1) Simple alias of value SELECT *, 1 AS NewColumn - this is the solution I called slower, (2) LEFT JOIN B on condition that must not be met, (3) CROSS JOIN. In the same time of 41 seconds the 3 queries selected (1) 177497 rows, (2) 234708 rows, (3) 198036 rows. So the winner is LEFT JOIN. I am not sure if I was able to maintain the same comparable circumstances in all 3 queries on my machine. Nevertheless, this competition is worth an eye of an expert arbiter. – Przemyslaw Remin Feb 12 '15 at 08:15
  • Maybe this is the rare scenario of fast performance of LEFT JOIN described here in the answer by dbenham awarded with +50 bounty (not the one at the top marked as accepted) http://stackoverflow.com/questions/2726657/inner-join-vs-left-join-performance-in-sql-server – Przemyslaw Remin Feb 12 '15 at 08:34
  • The example you have given is a long way from real world use, and performance in a real world example is likely to be affected by other factors such as the data stored, other joins that may be required, indexes etc. If you could show more detail and why where / why you need to improve performance it may help you get a better answer. Why do you need to assign the same value to all rows in your output? Is there a case where you may need other values in that column which would lead you to change the joins required (which would impact your entire query plan anyway, rendering the example invalid)? – Jamie Pollard Feb 12 '15 at 09:51
  • Perhaps I am missing something obvious, but your join criteria `ON A.ID <> B.NewColumn` for an outer join would result in the extra column in the dataset always having a NULL value, which is counter to the original requirement? – gvee Feb 23 '15 at 12:48
  • @gvee no, as long as the condition is met, so as long as in A.ID nothing equals B.NewColumn, you get results as shown in a picture of approach 2. – Przemyslaw Remin Feb 23 '15 at 12:55
  • @PrzemyslawRemin I see. I was just testing it and spotted my mistake. Incidentally, this will only work if the datatypes in the join are comparable. Wouldn't it be simpler to change the condition to `ON 1=1`? Some very quick tests show this to be marginally more efficient. Incidentally, in all my tests so far a CROSS JOIN comes out on top... – gvee Feb 23 '15 at 13:00
  • @PrzemyslawRemin you say that when you ran the queries they selected a different number of rows in the same time. I don't understand what you mean by this, because if the queries are identical then they should return the same number of rows unless there is some other activity on the server inserting and deleting rows. Try using SET STATISTICS IO ON and SET STATICTICS TIME ON and then posting your results – Steve Ford Feb 24 '15 at 11:55
  • what happens to the times if you execute them in a different order, e.g. approach 2 followed by approach 1. – Steve Ford Feb 24 '15 at 12:17
  • Just was wondering if SQL core engine could compile/work in a better way if you declare your 'value' in a variable for approach n°1: DECLARE \@value varchar(10) SET \@value = 'Value' SELECT A.*, \@value as NewColumn FROM Table1 A – Rubik Feb 24 '15 at 17:38
  • Have you looked at the query plans you get? I guess you have a parallel plan for your left outer join version and not for the others. It would be helpful if you could post the table structure of your table, number of rows and the plan you get for then join and the constant versions. In my testing I have seen that for some very specific number of bytes per row I get better performance with the left join version but only if the plan is parallel. Also, you join a varchar constant against an ID column that looks to be integer. That should not work so please post the table structure. – Mikael Eriksson Mar 01 '15 at 16:43

6 Answers6

2

I'm surprised this is faster for a simple expression, but you seem to want a cross join:

WITH B AS (SELECT 'Value' as NewColumn)
SELECT *
FROM Table1 A CROSS JOIN
     B;

I use this construct to put "parameters" in queries (values that can easily be changed). However, I don't see why it would be faster. If the expression is more complicated (such as a subquery or very complicated calculation), then this method only evaluates it once. In the original query, it would normally be evaluated only once, but there might be cases where it is evaluated for each row.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

You can also try with CROSS APPLY:

SELECT A.*, B.*,
FROM Table1 A
CROSS APPLY(SELECT 'Value' as 'NewColumn') B
Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75
1

Can you try to insert into a temp table instead of outputting to screen:

SELECT A.*, 'Value' as NewColumn
INTO #Table1Assign
FROM Table1 A

and

WITH B AS (SELECT 'Value' as 'NewColumn')
SELECT * Table1 A
INTO #Table1Join
LEFT JOIN B
ON A.ID <> B.NewColumn

That takes the actual transmission and rendering of the data to SSMS out of the equation, which could be caused by network slowdown or processing on the client.

When I run this with a 1M row table, I consistently get better performance with the simple assigning method, even if I switch to CROSS JOIN for the join method.

jlee-tessik
  • 1,510
  • 12
  • 16
1

I doubts that second approach will be faster,with three select and left join. First of all you should test same query with various sample data repeatedly.

What is the real scenario like ?

Inner join will be definitely faster than left join .

How about this ?

Declare @t table(id int,c2 varchar(10))
INSERT INTO @T
select 1,'A' union all
select 2,'A' union all
select 3,'B' union all
select 4,'B' 

Declare @t1 table(nEWcOL varchar(10))
INSERT INTO @T1 Values('Value')

-- #Approach1
--SELECT * FROM @T outer apply
 --@t1

--Create index on both join column
 --#Approach2
SELECT * FROM @T A inner join
 @t1 b on a.c2<>b.nEWcOL

--#Approach3
Declare @value varchar(20)
Select @value= nEWcOL from @t1

select *,@value value from @t
KumarHarsh
  • 5,046
  • 1
  • 18
  • 22
1

Too much text for a comment, so added this as an answer although I'm actually more adding to the question (**)

Somehow I think this is going to be one of those 'it depends' situations. I think it depends a lot on the amount of rows involved and even more on what happens afterwards with the data. Is it simply returned, is it used in a GROUP BY or DISTINCT later on, do we further JOIN or calculate with it etc..

Anyway, I think this IS an interesting question in that I've had to find out the hard way that having a dozen of 'parameters' in a single-row temp-table was faster than having them assigned upfront to 12 variables. Many, many moons ago the code I was given looked like an absurd construction to me so I rewrote it to use @variables instead. This was in a +1000-lines stored procedure which needed some extra performance squeezed out of it. After quite a bit of refactoring it turned out to run remarkably slower than before the change?!?!!

I've never really understood why and at the time simply reverted to the old version again. My best guess is some weird kind of combination of parameter-sniffing vs (auto-created?) statistics on the temp-table in question; if anyone could bring some light to your question it probably will lead to an answer of mine too =)

(**: I realize SO is not a forum so I apologise upfront, simply wanted to chime in that the observed behaviour of the OP isn't entirely anecdotal)

deroby
  • 5,902
  • 2
  • 19
  • 33
1

Select * doesn't use indexes properly on SQL, you should always specify your columns.

Other than that I would use

DECLARE @Value VARCHAR(30) = 'Value'
SELECT t.Id, t.C2, @Value NewColumn
FROM Table1 t
TonyM
  • 186
  • 2
  • 11
  • Although true that selecting all columns might cause a lookup that otherwise potentially could be avoided by selecting just the fields you need from a (covering) index; selecting all fields explicitly will not differ in any way from selecting the fields implicitly by means of the *-sign. – deroby Mar 01 '15 at 20:47