0

I have created a couple of views to return data of interest from a database, with the second view returning data from the first view rather than the base tables.

The first view (variantsToScore) pulls the relevant data together from across several tables. The second view (variants_to_score_by_transcript) then filters this returned data.

CREATE VIEW variantsToScore AS
SELECT v.variantID, v.chromosome, v.position, v.referenceBase, v.alternateBase, v.[HGVS_c], v.[HGVS_p], s.geneID, s.transcriptID, a.[1000g2014oct_all], a.[1000g2014oct_afr], a.[1000g2014oct_eur], a.[1000g2014oct_amr], a.[1000g2014oct_eas], s.scorer, s.scoreDate, s.score, s.checked, s.checker, t.reportTranscript
FROM dbo.variant v, dbo.varScore s, dbo.transcript_relevance t, dbo.varAnnotation a
WHERE v.variantID = s.variantID
AND s.geneID = t.geneID
AND s.transcriptID = t.transcriptID
AND v.concatField = a.concatField
AND v.variantID = a.variantID
AND s.geneID = a.geneID
AND s.transcriptID = a.transcriptID



Create View variants_to_score_by_transcript as
Select *
From variantsToScore
Where (reportTranscript = 'Y')

Union
Select *
From variantsToScore
Where variantID not in (

Select variantID
From variantsToScore
Where reportTranscript = 'Y');

Returning the first view is pretty quick - even though it returns several thousand rows. However, returning the second view (the filtered data) is quite a slow process.

I'm not very confident in the design of this second view and hope there is a better way to go about filtering the results than I have created. Also, in reading around I have come across multiple articles and posts which are not altogether positive about nesting views on principle.

I am not sure whether the poor performance is due to the design of the second view, or the nesting of the views, or maybe a combination of the two. Perhaps my design should cut out the nesting of the views altogether and try to return and filter the results all in one view? Will it be more useful to redesign the second view to improve performance than worrying about combining the views into one?

Any thoughts and reccomendations would be welcome. Cheers.

EDIT:

To try and clear up what I want to achieve with the second view I have included an expected table of results to be returned by the first view, and a desired table of results to be returned by the filtering of the second view.

Expected results before filtering: -

variantID | transcriptID | reportTranscript
    a     |       x      |         Y
    a     |       y      |         N
    a     |       z      |         N
    b     |       x      |         Y
    b     |       y      |         Y
    b     |       z      |         Y
    c     |       x      |         N
    c     |       y      |         
    c     |       z      |         N

Desired results after filtering: -

variantID | transcriptID | reportTranscript
    a     |       x      |         Y
    b     |       x      |         Y
    b     |       y      |         Y
    b     |       z      |         Y
    c     |       x      |         N
    c     |       y      |         
    c     |       z      |         N
Chris W
  • 21
  • 1
  • 4
  • Do you have an index on reportTranscript? – Shnugo Jul 10 '15 at 10:06
  • 1
    Your second view doesn't do what you think it does. And `UNION` is going to be massively inefficient. Views have **no** inherent ordering (that's why it doesn't do what you think it does) and `UNION` (without `ALL`) is forced to find and eliminate any duplicate rows. Finally, Views are like macros - they're "expanded out" in the query that makes use of them before optimization occurs - so nesting shouldn't have any effect. – Damien_The_Unbeliever Jul 10 '15 at 10:13
  • Shnugo - I don't currently no. This is a user-updated field and so will contain NULLs until users update the entry. Can a field still be indexed in this situation? – Chris W Jul 10 '15 at 10:16
  • Damien - Thanks. Any thoughts on the best approach to refine the second view? – Chris W Jul 10 '15 at 10:20
  • @ChrisW: Off course you can have an index on a column with nulls... It will speed up any filtering (WHERE...), JOINSs and comparisons – Shnugo Jul 10 '15 at 10:30
  • @Shnugo - Thanks. Will try indexing that column now. – Chris W Jul 10 '15 at 10:41

1 Answers1

2

Based on sample data/expected result, something like this should work:

Create View variants_to_score_by_transcript
as
   With Reports as (
     select *,MAX(ReportTranscript) OVER (PARTITION BY variantID) as RepTran
     from variantsToScore 
   )
   select * from Reports
   where RepTran <> 'Y' or ReportTranscript = 'Y'

That is - we include rows either if they have ReportTranscript equal to Y or no other rows for the same variantID have a ReportTranscript of Y.

What you appear to be trying to do with your second view is to sort the rows produced by the first view1. But this doesn't make sense - views (and tables) have no inherent ordering - the only way to guarantee the order of a set of results is to use an ORDER BY clause on the outermost query.

So, we could write your second view as a stored procedure:

Create Procedure variants_to_score_by_transcript
as
Select *
From variantsToScore
order by case when reportTranscript='Y' THEN 0 ELSE 1 END


For your first view, I'd at least suggest a stylistic change since you're still using pre-ANSI joins. ANSI joins were standardized in 1992 - so it's probably time to get used to them:

CREATE VIEW variantsToScore AS
SELECT v.variantID, v.chromosome, v.position, v.referenceBase, v.alternateBase,
       v.[HGVS_c], v.[HGVS_p], s.geneID, s.transcriptID, a.[1000g2014oct_all],
       a.[1000g2014oct_afr], a.[1000g2014oct_eur], a.[1000g2014oct_amr],
       a.[1000g2014oct_eas], s.scorer, s.scoreDate, s.score, s.checked, s.checker,
       t.reportTranscript
FROM dbo.variant v
       INNER JOIN
     dbo.varScore s
       ON
         v.variantID = s.variantID
       INNER JOIN
     dbo.transcript_relevance t
       ON
         s.geneID = t.geneID and
         s.transactionID = t.transactionID
      INNER JOIN
     dbo.varAnnotation a
      ON
         v.concatField = a.concatField and
         v.variantID = a.variantID and
         s.geneID = a.geneID and
         s.transcriptID = a.transcriptID

But this shouldn't have any performance impacts.


1If you were actually also trying to eliminate duplicate rows, I'd suggest strongly that you identify what's causing duplicate rows to be produced (this is usually a logic issue), or, if they still need to be eliminated, use the DISTINCT option in the SELECT clause, rather than using UNION for this.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • you were one of those who voted to close my question here: http://stackoverflow.com/q/31289331/5089204. Your solution with a stored procedure is a perfect exampel of - in my point of view! - **very bad** approach. If it was an inline UDF you could use it directly everywhere (CTE, Join,,,) So: Why do you think is good? And why did you vote for closing my question? – Shnugo Jul 10 '15 at 10:27
  • @Shnugo - an inline UDF can't use `ORDER BY` either (except for the same hacky workarounds people suggest to use them in views). If you want `ORDER BY` to be applied to a query stored in SQL Server, and you want that `ORDER BY` to be used, specifically, to guarantee the order of results (as opposed to giving semantics to `TOP` or `FETCH` clauses) than the only supported place to put that would be within a stored procedure. – Damien_The_Unbeliever Jul 10 '15 at 10:32
  • @Damien - Cheers. I intended to filter results in the second view. A _variantID_ may occur multiple times, depending on entries in the _transcriptID_ column, with corresponding entries in the _reportTranscript_ either 'Y', 'N' or NULL. I want to return all entries where the _reportTranscript_ column is 'Y'. Additionally, for any variantIDs where there are no 'Y's in the _reportTranscript_ column, I would like all entries for that variantID to be returned as well. I worry that I may not be explaining this particularly clearly. Is this what you understood as what I was trying to achieve? – Chris W Jul 10 '15 at 10:39
  • @ChrisW - maybe edit your question and add *sample data* and *expected results*. (You may want to trim down the number of tables involved, or just give us the sample data as "this is what view 1 returns") – Damien_The_Unbeliever Jul 10 '15 at 10:41
  • @Damian, I disagree: You yourself state that the order by must be done in the outermost SELECT. So the right way is (assuming there is an inline UDF) a call like SELECT * FROM dbo.MyFunction(@prm) ORDER BY aField – Shnugo Jul 10 '15 at 10:43
  • 1
    @Shnugo - no, if you want to apply the `ORDER BY` to a query and you're not seeking to *store* that query within SQL Server then there's no need for a UDF at all - you just have view 1 and a query that makes use of it. I'm, specifically, showing how you can *store* the ordering requirement within the database. – Damien_The_Unbeliever Jul 10 '15 at 10:46
  • @Damian - I have included tables of expected results before, and desired results after, the filtering to be carried out by the the second view in an edit to the question. – Chris W Jul 10 '15 at 11:00
  • @Damian - Perfect. Cheers mate. Will give it a roll. – Chris W Jul 10 '15 at 11:10