2

I have a database where I create a new table every day and fill it with data (I know this isn't ideal, but I can't change this). Each table is of the form "TESTdata_xxxxx_DB", where xxxxx is incremented by 1 every day.

I need a simple way to select the top 1000 rows (where a specific condition is met) across many of those tables (i.e. across many dates). For example, I need to query across tables TESTdata_45800_DB, TESTdata_45801, ..., TESTdata_45850_DB.

I have tried the following query, but clearly adding "to" doesn't work, and separating them by comma doesn't combine them the way I want:

SELECT TOP 1000 
    [ItemIndex],
    [Data1],
    [Data2],
    [Data3]     
FROM 
    [TESTDB1].[dbo].[TESTdata_45800_DB] (to...) [TESTdata_45850_DB]
WHERE 
    Data1 LIKE 'High' OR Data1 LIKE 'Medium'
ORDER BY 
    Data1
;

Any help would be appreciated.

John K.
  • 480
  • 4
  • 8
Kev W
  • 35
  • 6
  • 4
    "I create a new table every day" why? – njzk2 Oct 22 '22 at 20:55
  • 5
    Don't do this. Have one table and a column to indicate the day; the date, for example. – Stu Oct 22 '22 at 20:55
  • 1
    That is a bad, bad, bad design. one table with a date column – OldProgrammer Oct 22 '22 at 21:36
  • Thanks for the above but for various reasons, I am stuck with this structure. It may not be efficient or considered practise, but it is what I have and still need a solution. Thanks – Kev W Oct 23 '22 at 22:09
  • You asked for a solution. The solution is to use one table with a date column. Your problem is not selecting across multiple tables, it is you are “stuck with this structure”. You need to get unstuck from this structure; SQL is not going to do that. – Dour High Arch Nov 18 '22 at 20:30

2 Answers2

1

A stored procedure using dynamic SQL is a cool approach for this sort of problem. Create a numbers table in memory, use it to create your UNIONed super table, and then execute dynamic SQL to get the result you need:

CREATE PROCEDURE [dbo].[Top1000]
(
    @startDatabaseNumber INT,
    @endDatabaseNumber INT
)
AS
SET NOCOUNT ON

-- inspired by https://stackoverflow.com/a/33146869/7806251
DECLARE @digits TABLE(d INT)
INSERT INTO @digits 
    SELECT 
        d
    FROM 
        (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(d)
 
DECLARE @numbers TABLE(DatabaseNumber VARCHAR(5))       
INSERT INTO @numbers
        SELECT
            LEFT('00000', 5 - LEN(n)) + n AS DatabaseNumber
        FROM
            (
            SELECT 
                CAST(
                    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)
                    ) AS VARCHAR(10)
                ) AS n
            FROM 
                @digits ones, 
                @digits tens, 
                @digits hundreds,
                @digits thousands,
                @digits tenthousands
            ) x
        WHERE
            n BETWEEN @startDatabaseNumber AND @endDatabaseNumber

DECLARE @unionedTable VARCHAR(MAX) = (      
    SELECT
        STRING_AGG(
            CAST(
                '(
                    SELECT
                        ItemIndex,
                        Data1,
                        Data2,
                        Data3
                    FROM
                        [TESTDB1].[dbo].[TESTdata_'+DatabaseNumber+'_DB]
                )'
            AS VARCHAR(MAX)) -- circumvents STRING_AGG() 8000 character constraint
        , ' UNION ALL ') -- just UNION if you want to de-dupe
    FROM
        @numbers
    )

DECLARE @sql NVARCHAR(MAX) = '
    SELECT TOP 1000 
        *
    FROM 
        ('+@unionedTable+') UnionedTable
    WHERE 
        Data1 IN (''High'', ''Medium'')
    ORDER BY 
        Data1
    ;
    '

-- PRINT @sql -- if you need to debug
EXECUTE sp_executesql @sql

Once this is run and exists, call it with whatever start and end numbers you'd like:

EXEC Top1000 5, 15;

Finally, as other commenters have expressed, I'd suggest not creating a new table each day if you can help it. Inserting each day's data into a running table is more natural and better practice.

John K.
  • 480
  • 4
  • 8
  • Thank you for this - I will try and report back. – Kev W Oct 23 '22 at 22:09
  • K - Hi John, suspect this is something very simple but executing the SP gives the following error: Msg 214, Level 16, State 2, Procedure sp_executesql, Line 1 [Batch Start Line 2] Procedure expects parameter '@statement' of type 'ntext/nchar/nvarchar'. – Kev W Oct 24 '22 at 15:35
  • @KevW `sp_executesql` needs an `NVARCHAR` parameter (I overlooked that). Try running the code again. – John K. Oct 24 '22 at 15:54
  • K - Thanks John, that seems to have removed the error. Apologies for my total lack of understanding, but I'm not getting any results returned now - just: Commands completed successfully. Completion time: 2022-10-24T17:09:55.6148376+01:00 – Kev W Oct 24 '22 at 16:12
  • @KevW What DBMS are you using? Are you not getting any results when you run the `EXEC` part, after you've created the procedure? – John K. Oct 24 '22 at 16:29
  • K - yes, SQL Server - could this line be the problem: WHERE Data1 IN (''High'', ''Medium'') - I normally query using WHERE Data1 like 'High' - I'm not familiar with the 'IN' use – Kev W Oct 24 '22 at 16:33
  • K - I have removed the condition and asked it to just return TOP 1000 from unioned table and the result is still zero. – Kev W Oct 24 '22 at 17:18
  • At this point it's hard to know what's going on‚ since it's working for me. I added a `PRINT` statement that you should uncomment for debugging (copy/paste the query that is printed upon `EXEC Top1000 5, 15;` and run it in your GUI). – John K. Oct 24 '22 at 18:40
  • Might have to try another method then. Print @sql seems to do nothing. The only message returned is Completion time: 2022-10-24T21:57:07.2619083+01:00 with no data displayed. It's as if the unionised table is blank. Thanks for putting in the time though – Kev W Oct 24 '22 at 20:59
  • Hi John, what I can't understand however, is why when I PRINT @sql rather than execute, there is nothing there. – Kev W Oct 25 '22 at 08:22
  • Hi John, I have found the issue. I had been trying to extract data from tables using numbers above 10000 - I've added the line '@digits tenthousands' and it now can use numbers I was trying (44850 - 44855) – Kev W Oct 25 '22 at 11:52
  • @KevW Ah, great! I will update the SQL in the answer to reflect that. Glad it ended up working. – John K. Oct 25 '22 at 11:58
1

If you want a union of tables, meaning all the rows of all the tables dumped into one row set, then use the SQL UNION. For example (this is suggested code, I have not tested it but UNION is very standard SQL and should work everywhere)

SELECT TOP (1000) [ItemIndex],[Data1],[Data2],[Data3]
 from 
   (select [Data1],[Data2],[Data3]
   FROM [TESTDB1].[dbo].[TESTdata_00005_DB] 
   )      
 UNION          
   (select [Data1],[Data2],[Data3]
   FROM [TESTDB1].[dbo].[TESTdata_00006_DB] 
   )      
  UNION
   (select [Data1],[Data2],[Data3]
   FROM [TESTDB1].[dbo].[TESTdata_00007_DB] 
   )      
 WHERE Data1 like 'High' OR Data1 like 'Medium'
  ORDER BY Data1

By the way, though UNION works quite well it is not good performance and in my experience doing a bunch of unions may grow to be difficult to support in the long run. At some point you might consider using alternate data architecture such as a single table but with multiple partitions, one for each day.

  • Thanks @Robert Hamilton - I tried this but get the syntax error after the first UNION... Expecting '(',ALL, or SELECT. – Kev W Oct 23 '22 at 22:04
  • For many reasons, there has to be a new table each day due to data volumes and weeding etc. I will research UNION a little more and see if I can find why it doesn;t work in this instance – Kev W Oct 23 '22 at 22:07
  • 1
    @KevW see the edited. Union requires the inline table structure as shown. Also the top 100 clause would require it. – Robert Hamilton Oct 24 '22 at 00:20
  • Thanks Robert - this is so simple yet effective. The query ends up being quite long for a 30 day run, but not really an issue. – Kev W Oct 25 '22 at 08:04