2

I have 2 tables. Table A has Date, ISBN (for Book), Demand(demand for that date). Table B has Date, ISBN (for Book), and SalesRank.

The sample data is as follows: The DailyBookFile has 150k records for each date, from year 2010 (i.e. 150k * 365 days * 8 years) rows. Same goes with SalesRank Table having about 500k records for each date

DailyBookFile       
Date        Isbn13         CurrentModifiedDemandTotal
20180122    9780955153075   13
20180122    9780805863567   9
20180122    9781138779396   1
20180122    9780029001516   9
20180122    9780470614150   42

SalesRank       
importdate  ISBN13          SalesRank
20180122    9780029001516   69499
20180122    9780470614150   52879
20180122    9780805863567   832429
20180122    9780955153075   44528
20180122    9781138779396   926435

Required Output     
Date        Avg_Rank    Book_Group
20180122    385154  Elite
20180121    351545  Elite
20180120    201545  Elite

I want to get the Top 200 CurrentModifiedDemand for each day, and take the average Rank.

I am unable to work out a solution as I am new to SQL.

I started with getting the Top 200 CurrentModifiedDemand for yesterday and get the Avg Rank over last year.

SELECT DBF.Filedate AS [Date],
       AVG(AMA.SalesRank) AS Avg_Rank,
       'Elite' AS Book_Group 
FROM [ODS].[wholesale].[DailyBookFile] AS DBF
INNER JOIN [ODS].[MarketplaceMonitor].[SalesRank] AS AMA ON (DBF.Isbn13 = AMA.ISBN13
                                                        AND DBF.FileDate = AMA.importdate)
WHERE DBF.Isbn13 IN (SELECT TOP 200 Isbn13
                     FROM [ODS].[wholesale].[DailyBookFile]
                     WHERE FileDate = 20180122
                       AND CAST(CurrentModifiedDemandTotal AS int) > 200)
  AND DBF.Filedate > 20170101
GROUP BY DBF.Filedate;

But the result is not what I want. So, now I want the ISBN for the Top 200 CurrentModifiedDemand for each day and their avg rank. I tried with this.

DECLARE @i int;
SET @i = 20180122;
WHILE (SELECT DISTINCT(DBF.Filedate)
       FROM [ODS].[wholesale].[DailyBookFile] AS DBF
       WHERE DBF.Filedate = @i) IS NOT NULL
BEGIN

    SELECT DBF.Filedate AS [Date],
           AVG(AMA.SalesRank) AS Avg_Rank,
           'Elite' AS Book_Group 
    FROM [ODS].[wholesale].[DailyBookFile] AS DBF
    INNER JOIN [ODS].[MarketplaceMonitor].[SalesRank] as AMA ON DBF.Isbn13 = AMA.ISBN13
                                                            AND DBF.FileDate = AMA.importdate
    WHERE DBF.Isbn13 in (SELECT TOP 200 Isbn13
                         FROM [ODS].[wholesale].[DailyBookFile]
                         WHERE FileDate = @i
                           AND CAST (CurrentModifiedDemandTotal AS int) > 500)
      AND DBF.Filedate = @i
    GROUP BY DBF.Filedate;

    SET @i = @i+1;

END

In this I am getting one select query result in each window. Is there any way to have the result in a single table?

P.S. The list of top 200 books every day will change according to the CurrentModifiedDemand. I want to take their avg. sales rank for that day.

Neil S
  • 229
  • 7
  • 20
  • You've tagged `mysql` and `sql-server` here? Which are you using? Also, using a `WHILE` loop is a bad idea. Could you post some DDL, consumable Sample data and expected output? There will likely be a much better dataset approach for this. – Thom A Jan 23 '18 at 16:06
  • 1
    is this question for mysql or for sql-server ? they are both different – GuidoG Jan 23 '18 at 16:06
  • Regardless of the DBMS don't be scared to use some white space and format your queries. This is far more difficult to read than it needs to be. – Sean Lange Jan 23 '18 at 16:08
  • So sorry, I am using sql-server. Ill try to re-word this. – Neil S Jan 23 '18 at 16:12
  • IN your expected output, you have values for dates `'20180121'` and `'20180120'`, however, there's no sample data for it. Where do these values come from? – Thom A Jan 23 '18 at 16:53
  • @Larnu every date has values for ISBNs, I though of just showing the sample data for 1 date and expected output for multiple dates. Sorry if it was confusing for you. – Neil S Jan 23 '18 at 17:14
  • Wouldn't a function (or stored procedure, whatever it would be called in SQL Server) make more sense, where you can pass in the given date parameter and it would return the results based on that date? You could even have an optional count parameter, so it could be like TopBooksByDate(, ) with count defaulting to 200. – Anthony Jan 23 '18 at 18:52

2 Answers2

2

Instead of immediately selecting in each iteration of the loop, you can insert rows to temp table (or table-type variable) and select everything after the loop finishes:

IF OBJECT_ID('tempdb..#books') IS NOT NULL
BEGIN
    DROP TABLE #books
END

CREATE TABLE #books (
    [Date] INT,
    [Avg_Rank] FLOAT,
    [Book_Group] VARCHAR(512)
);

DECLARE @i int;
SET @i = 20180122;

BEGIN TRY
WHILE (SELECT DISTINCT(DBF.Filedate)
    FROM [ODS].[wholesale].[DailyBookFile] AS DBF
    WHERE DBF.Filedate = @i) IS NOT NULL
BEGIN

    INSERT INTO #books (
        [Date],
        [Avg_Rank],
        [Book_Group]
    )
    SELECT DBF.Filedate AS [Date],
        AVG(AMA.SalesRank) AS Avg_Rank,
        'Elite' AS Book_Group 
    FROM [ODS].[wholesale].[DailyBookFile] AS DBF
    INNER JOIN [ODS].[MarketplaceMonitor].[SalesRank] as AMA ON DBF.Isbn13 = AMA.ISBN13
                                                            AND DBF.FileDate = AMA.importdate
    WHERE DBF.Isbn13 in (SELECT TOP 200 Isbn13
                        FROM [ODS].[wholesale].[DailyBookFile]
                        WHERE FileDate = @i
                        AND CAST (CurrentModifiedDemandTotal AS int) > 500)
    AND DBF.Filedate = @i
    GROUP BY DBF.Filedate;

    SET @i = @i+1;

END
END TRY
BEGIN CATCH
    IF OBJECT_ID('tempdb..#books') IS NOT NULL
    BEGIN
        DROP TABLE #books
    END
END CATCH

SELECT *
FROM #books

DROP TABLE #books

Using table-type variable would yield simpler code, but when storing large amounts of data table-type variables start losing in performance against temp tables. I'm not sure how many rows is a cut-off, but in my experience I've seen significant performance gains from changing table-type var to temp table at 10000+ row counts. For small row counts an opposite might apply.

Justinas Marozas
  • 2,482
  • 1
  • 17
  • 37
  • 1
    You mention the use of the Table Variable, cost, which is good, but you still use a `WHILE` Loop. This is going to have a high cost unto itself as well. :) – Thom A Jan 23 '18 at 17:02
  • Good point, @Larnu . I was too focused on question bit asking how to join results in a while loop :) – Justinas Marozas Jan 23 '18 at 17:11
1

This avoids a costly WHILE loop, and I believe achieves your goal:

CREATE TABLE #DailyBookFile ([Date] date,
                            Isbn13 bigint,
                            CurrentModifiedDemandTotal tinyint);

INSERT INTO #DailyBookFile
VALUES ('20180122',9780955153075,13),
       ('20180122',9780805863567,9 ),
       ('20180122',9781138779396,1 ),
       ('20180122',9780029001516,9 ),
       ('20180122',9780470614150,42);

CREATE TABLE #SalesRank (importdate date,
                        ISBN13 bigint,
                        #SalesRank int);
INSERT INTO #SalesRank
VALUES ('20180122',9780029001516,69499 ),
       ('20180122',9780470614150,52879 ),
       ('20180122',9780805863567,832429),
       ('20180122',9780955153075,44528 ),
       ('20180122',9781138779396,926435);
GO
WITH Ranks AS(
    SELECT SR.*,
           RANK() OVER (PARTITION By SR.importdate ORDER BY SR.#SalesRank) AS Ranking
    FROM #SalesRank SR
         JOIN #DailyBookFile DBF ON SR.ISBN13 = DBF.Isbn13
                               AND SR.importdate = DBF.[Date])
SELECT importdate AS [Date],
       AVG(#SalesRank) AS Avg_rank,
       'Elite' AS Book_Group
FROM Ranks
WHERE Ranking <= 200
GROUP BY importdate;

GO
DROP TABLE #DailyBookFile;
DROP TABLE #SalesRank;
Thom A
  • 88,727
  • 11
  • 45
  • 75