22

I have normalized tables in a database and to denormalize it, I created a view out of two tables. When I tried to create a clustered index on the view, it wouldn't let me, as the view was created with a left outer join. I used a left join because I want the null values to show up in the resulting view, much like how it was suggested in this earlier post.

Question on join where one column one side is null

The table structure and relationship is very much similar to what was described in the above link.

I seemed to hit a wall here as I couldn't convert my left join into an inner join, as that would exclude all records with null values on any of the joined columns. My questions are:

  1. Why is indexing not allowed on outer or self joins?
  2. Are there any performance hits on this kind of un-indexed view?
  3. Anyone knows any workaround to this problem?

I've just finished a SQL Server course yesterday so don't know how to proceed. Would appreciate any comments. Cheers.

Community
  • 1
  • 1
Noble_Bright_Life
  • 569
  • 3
  • 9
  • 16
  • Do you need columns from the OUTER table? – gbn Jun 25 '11 at 09:38
  • 1
    If you really need the indexed view, you can create the (indexed) INNER JOIN one. Then you can LEFT JOIN the original (left side) table to this view to get the wanted output but with the performance gains of the indexed view. – ypercubeᵀᴹ Jun 25 '11 at 09:44
  • @gbn & @ypercube: yeah, I need the column on the left outer table, and unfortunately, that is where I need to have the unique clustered index on. – Noble_Bright_Life Jun 25 '11 at 13:06

5 Answers5

15

Here is an alternative. You want a materialized view of A not containing B. That isn't directly available... so instead, materialize two views. One of all A's and one of only A's with B's. Then, get only A's not having B's by taking A except B. This can be done efficiently:

Create two materialized views (mA and mAB) (edit: mA could just be the base table). mA lacks the join between A and B (thus containing all A's period [and therefore containing those records WITHOUT matches in B]). mAB joins between A and B (thus containing only A's with B's [and therefore excluding those records WITHOUT matches in B]).

To get all A's without matches in B, mask out those that match:

with ids as (
  select matchId from mA with (index (pk_matchid), noexpand)
  except
  select matchId from mAB with (index (pk_matchid), noexpand)
)
select * from mA a join ids b on a.matchId = b.matchId;

This should yield a left anti semi join against both your clustered indexes to get the ids and a clustered index seek to get the data out of mA you are looking for.

Essentially what you are running into is the basic rule that SQL is much better at dealing with data that IS there than data that ISN'T. By materializing two sources, you gain some compelling set based options. You have to weigh the cost of these views against those gains yourself.

cocogorilla
  • 1,815
  • 14
  • 36
12

There is a "workaround" here that involves check for NULL in the join and having a NULL representation value in the table

NULL value

INSERT INTO Father (Father_id, Father_name) values(-255,'No father')

The join

JOIN [dbo].[son] s on isnull(s.father_id, -255) = f.father_id
Magnus
  • 45,362
  • 8
  • 80
  • 118
  • 19
    @Damien, @gbn: Is it only me that this "workaround" seems awful? What happens to other queries that have `SELECT COUNT(*) FROM rightTable`? You'll have to change them to correct the error (by 1) they'll give after the workaround? – ypercubeᵀᴹ Jun 26 '11 at 09:00
  • 2
    @ypercube your right, it is awful and not really a workaround. But it is the only way to achive what you want. A better solution may be to index the underlying tables and skip the indexed view. – Magnus Jun 26 '11 at 09:06
  • 5
    @Damien, as the old idiom goes: beggars can't be choosers – Martin Capodici Oct 24 '17 at 03:05
7

I don't think there is a good workaround. What you can do about this is to create a real table from the view and set indexes on that. This can be done by a stored procedure that is called regularly when data is updated.

Select * 
into <REAL_TABLE>
From <VIEW>

create CLUSTERED index <INDEX_THE_FIELD> on <REAL_TABLE>(<THE_FIELD>)

But this is only a noteworthy approach if data isn't updated every few seconds.

Klaus
  • 109
  • 1
  • 1
  • 3
    It's a good solution. There is no need to update whole table every time. We can write trigger and update only changed rows. – Dmitriy Startsev Mar 18 '15 at 11:56
  • As much as I *dislike* manual table updates.. stuff can be 'lost' easily.. this can be viable in certain situations. – user2864740 Oct 29 '18 at 18:36
  • Problem for batch operations, performance would be slow. Record update delete needs to be taken care as well, adds more maintenance. – suhas0sn07 Apr 16 '19 at 05:02
1

Logically you are making two separate queries. 'A LEFT JOIN B' is just shorthand for '(A JOIN B) UNION A'

The first query is table A inner joined to table B. This gets an indexed view, since this is where all the heavy lifting is done.

The second query is just table A where any of the join columns are null. Make a view that produces the same output columns as the first query and pads them with nulls.

Just union the two results before returning them. No need for a workaround.

Anon
  • 10,660
  • 1
  • 29
  • 31
  • 9
    This still won't work as INDEX'es aren't allowed in views that use the keywords UNION, INTERSECT, or EXCEPT – a11smiles Apr 27 '15 at 13:39
  • 1
    Try it before declaring it won't work. A `UNION` of views *can* use the indices on the underlying views. – Anon Apr 27 '15 at 16:33
  • 5
    I just tried it and it's NOT possible to have an indexed view which uses UNION. Check here https://msdn.microsoft.com/en-us/library/ms191432.aspx#Restrictions – ilija veselica Jul 14 '15 at 13:51
  • 1
    Create index view C for A JOIN B, then use normal view for C union A. – Irawan Soetomo Mar 15 '16 at 11:04
0

I'll work on an answer to 1, but for now:

[2]. The view will be no more nor less performant than the equivalent query on the udnerlying tables. All the usual advice applies about having covering indexes, preferably an index on the joined columns, etc.

[3]. There's no real workaround. Most of the restrictions on indexed views exist for very good reasons, once you dig into them.

I'd just create the view, generally, and do no more, unless there was a specific performance problem.

I'll try to add an answer for 1 once I've reconstructed it in my own mind.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • 2
    Lazyness from MS programmers to construct indexed views for `a LEFT JOIN b LEFT JOIN ... LEFT JOIN z` ? – ypercubeᵀᴹ Jun 25 '11 at 09:46
  • 7
    @ypercube - most of the restrictions exist because they allow the indexed view to be maintained without having to perform full table scans for one or more base tables (e.g. why the only aggregates allowed are SUM, and then only if COUNT_BIG is also included), or perform other expensive operations. – Damien_The_Unbeliever Jun 25 '11 at 09:51
  • Thanks for answering my other questions. Cheers. – Noble_Bright_Life Jun 25 '11 at 13:19
  • 1
    Except a LEFT JOIN (at least with some join condition restrictions) doesn’t require any more index scans more than an INNER JOIN. Not supporting this basic LEFT JOIN scenario makes Indexed Views garbage for extending schemas: so in this case it’s a Feature/Functionality Defect. – user2864740 Jan 02 '21 at 21:51