0

So the problem is that I have a dataset where each row contains the following data:

   ID, data, year, meta

The data contained in different year varies a bit, but some data is there every year. I want to query data that only exists in some years, but I also want to query the data that exists every year in the query.

For example:

  • DATA1 can be in years 2011-2013,
  • while DATA2 can be in years 2009-2015.

I want to query for both of them, where they have a matching ID and both exist only in that year. So far a working solution that I have is this.

    SELECT ID, data, year, meta
    FROM table1 WHERE year IN 

    (
    SELECT year
    FROM table1
    WHERE meta LIKE 'DATA1'
    )

   AND meta LIKE 'DATA2'

   UNION ALL 

   (
   SELECT ID, data, year, meta
   FROM table1
   WHERE meta LIKE 'DATA1'
   )

   ORDER BY year, ID

the problem here is that the query

   SELECT ID, data, year, meta
   FROM table1
   WHERE meta LIKE 'DATA1'

is repeated once (it is used as a condition in the first query, and then it is also queried for itself. I was wondering if there's a way to query for what I want, without repeating the above query. I.e. i want to do a query, return the results, then do another query that only has data that overlaps within the first query while only doing two queries in total. example input:

    122, DATA1_datapoint, 2010, DATA2
    122, DATA1_datapoint, 2009, DATA2
    123, DATA1_datapoint, 2011, DATA2
    123, DATA2_datapoint, 2011, DATA1
    124, DATA1_datapoint, 2012, DATA2
    124, DATA2_datapoint, 2012, DATA1

Expected output (id is 3-digit number, e.g. 123):

    123, DATA1_datapoint, 2011, DATA1
    123, DATA2_datapoint, 2011, DATA2
    124, DATA1_datapoint, 2012, DATA1
    124, DATA2_datapoint, 2012, DATA2
Christian W.
  • 2,532
  • 1
  • 19
  • 31
  • Are you asking about `UNION` vs `UNION ALL`? One combines everything, one takes the only the unique entries; `{1, 2, 3} UNION ALL {3, 4, 5} => {1, 2, 3, 3, 4, 5}` *(the `3` is repeated)* where as `{1, 2, 3} UNION {3, 4, 5} => {1, 2, 3, 4, 5}` *(The `3` is not repeated)*. Note this also means that duplicates in either of your inputs will be de-duplicated too... `{1, 2, 3, 3} UNION {4, 5} => {1, 2, 3, 4, 5}` *(Whether this is what you want or not, you are well advised to include some actual example data and the results you expect from that data.)* – MatBailie Oct 26 '15 at 12:44
  • I dont see where you repeat that select? the query where you use `like 'DATA1' are different – Juan Carlos Oropeza Oct 26 '15 at 12:49
  • I edited expected output to the post. The idea is that i want to do two queries, where the same ID can be found in both years (and therefore if metavalue (DATA2) is found in a set where (DATA1) is present, it means that they have been recorded in the same year. I explicitly want UNION ALL, although there should be no duplicates either way. – Christian W. Oct 26 '15 at 12:50
  • @telac - Without example input data the example output data is not going to help much. – MatBailie Oct 26 '15 at 12:51
  • I added example input data! thanks for the suggestion. – Christian W. Oct 26 '15 at 12:53
  • 1
    Maybe I'm missing something but from what I can see neither your original query nor the accepted answer produces the expected output you provide (see [this sqlfiddle](http://www.sqlfiddle.com/#!6/b365e/2)). To get the expected output you could instead use any of the queries in this [sqlfiddle](http://www.sqlfiddle.com/#!6/b365e/3). Or maybe I misunderstood something? Or is your sample expected output incorrect? – jpw Oct 26 '15 at 13:14
  • Yeah, the mistake in the input and output data was the ratio between DATA1 and DATA2 (there was supposed to be more of DATA2). Example data should now match the query. The idea in the original input/output was still correct though, DATA1 and 2 had just swapped places in the query / in the example data. (the query I provided brings the wanted results in real environment at least). – Christian W. Oct 26 '15 at 13:49
  • Why LIKE without wildcards? – jarlh Oct 26 '15 at 13:59
  • the LIKE was a remainder from some other queries where I used wildcards, where I wanted for example data10-19. I see now that my question was a bit poorly formatted though. The last query that @jpw provided was pretty amazing though, really simple, yet I had not thought of doing it like that. – Christian W. Oct 26 '15 at 14:01

1 Answers1

2

What about you create a CTE?

WITH CTE AS ( 
   SELECT ID, data, year, meta
   FROM table1
   WHERE meta LIKE 'DATA1'
) 
SELECT ID, data, year, meta
FROM table1 
WHERE year IN (SELECT year FROM CTE)
  AND meta LIKE 'DATA2'
UNION ALL 
SELECT ID, data, year, meta 
FROM CTE
ORDER BY year, ID 
Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
  • 1
    Ahhh, this makes sense, I understand the question now... It is worth noting that in SQL-Server CTE's are not cached result-sets. This means that this does help with Don't-Repeat-Yourself *(DRY)* principles, but does not avoid the redundant load on the server. – MatBailie Oct 26 '15 at 12:50
  • Hey, thanks! This is what I was looking for. @MatBailie, do you know if there's even a better way to do this, or this is as good as it gets? – Christian W. Oct 26 '15 at 12:56
  • @MatBailie Could you include a link talking about that fact? I really tought was some kind of precalculated set. – Juan Carlos Oropeza Oct 26 '15 at 12:59
  • @JuanCarlosOropeza - No one link in particular. But if you search for `SQL Server CTE Materialization` or `SQL Server CTE Expansion` you'll find a lot about it distributed around the web. In over simplified terms a CTE is like a macro and undergoes macro-like-expansion *(just like views normally do)*. Here is a SO link though... http://stackoverflow.com/questions/690465/which-are-more-performant-cte-or-temporary-tables – MatBailie Oct 26 '15 at 13:20