7

I have the following query, which returns all merchants in the database who have transactions between two given dates.

SELECT distinct me.id, me.merchant_num, me.merchant_nm, count(1) as num_transactions
       ,CASE WHEN me.status = 'A' THEN 'Yes' ELSE 'No' END as production_mode
FROM merchant_t me
LEFT OUTER JOIN transaction_t tt
    ON tt.merchant_id = me.id 
    AND tt.transaction_dt BETWEEN '2020-04-01' and '2020-04-30'
WHERE me.status = 'T'
GROUP BY me.id, me.merchant_num, me.merchant_nm, me.status

But, there is additional information I need to include in the query. I've written the enhanced query below, but it results in duplicated merchants. I know the query could be written better, but that's the limit of my SQL knowledge.

I need to keep the same number of results as the above query.

SELECT distinct me.id, me.merchant_num, me.merchant_nm, count(1) as num_transactions
       ,CASE WHEN me.status = 'A' THEN 'Yes' ELSE 'No' END as production_mode   

       -- additional information needed as below, with comments

       -- the date of the last transaction that uses IBank (IBA)
       ,(select max(transaction_dt) 
           from transaction_t where merchant_id = me.id 
           and tt.bank_txt = 'IBA') as last_ibank_transaction

       -- the value of the "trans_live" column for the merchant's most recent transaction
       ,(select top 1 trans_live          
           from transaction_t
           where merchant_id = me.id order by transaction_dt desc) is_live
FROM merchant_t me
    LEFT OUTER JOIN transaction_t tt
         ON tt.merchant_id = me.id 
         AND tt.transaction_dt BETWEEN '2020-04-01' and '2020-04-30'
    WHERE me.status = 'T'
    GROUP BY me.id, me.merchant_num, me.merchant_nm, me.status, tt.bank_txt

This returns duplicated merchants. In the screenshot below, I have filtered on a single merchant to show the duplication. The first result set is from the original query. The second is from the updated query. "DISTINCT" does its job but I only want to see one merchant record.

enter image description here

There are two tables involved:

merchant_t
----------
id     merchant_num    merchant_nm    status


transaction_t
--------------
id     merchant_id     transaction_dt     trans_live     bank_txt

EDIT

I am trying to avoid having the dates inside a subquery

VARUN'S ANSWER

The updated query returns many duplicates as below.

MAX(CASE WHEN tt.bank_txt = 'IBA'

SQL SCRIPT

USE [XYZ]
GO
/****** Object:  Table [dbo].[merchant_t]    Script Date: 6/10/2020 5:23:51 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[merchant_t](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [merchant_num] [nvarchar](50) NULL,
    [merchant_nm] [nvarchar](100) NULL,
    [status] [nchar](1) NULL
) ON [PRIMARY]
GO
/****** Object:  Table [dbo].[transaction_t]    Script Date: 6/10/2020 5:23:51 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[transaction_t](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [merchant_id] [int] NULL,
    [transaction_dt] [datetime] NULL,
    [trans_live] [bit] NULL,
    [bank_txt] [nvarchar](30) NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[merchant_t] ON 
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (335, N'PriceBusterDVD_NZ_AN', N'Pricebuster NZ ANZ', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (341, N'T6400050', N'Merco Test Merchant', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (342, N'6400262', N'Musac School Test 1', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (366, N'T6400093', N'Paystation Limited Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (367, N'T6400435', N'PB Technologies Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (374, N'PriceBusterDVD_NZ_BN', N'Pricebuster NZ BNZ', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (389, N'TAirNewZealandNZ_All', N'Test Air NZ - All', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (390, N'T6400061', N'The Warehouse Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (392, N'T6400246', N'University of Waikato Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (538, N'T6400449', N'NZTA Payments Dev System Testing', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (544, N'T6400447', N'NZTA Tolling Dev System Testing', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (631, N'SS64000475', N'Smeedi Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (729, N'T6400048', N'Marram Community Trust Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (776, N'SS64000665', N'POLi Test Bench', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (936, N'T6400002', N'WorldRemit Test NZ', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (1033, N'SS64005103', N'Ahura Consulting Limited', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (1173, N'SS64005386', N'Warehouse Stationery Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (1236, N'SS64005423', N'KlickEx Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (1435, N'T6400477', N'NZMCA', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (1580, N'T6400478', N'2 Degrees Mobile - Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (1626, N'SS64006121', N'Property Council New Zealand', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (1714, N'SS64006558', N'ServiceFinder.co.nz', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (1972, N'T6400480', N'2 Degrees Mobile Accept Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (1988, N'T6400484', N'HelloClub Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (2011, N'T6400482', N'Horowhenua District Council Test', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (2260, N'SS64008067', N'Success Global', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (2274, N'64007479', N'Samsung Electronics New Zealand', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (2397, N'SS64008228', N'MyBitcoinSaver.com', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (2418, N'6400478', N'Spark Staging', N'T')
GO
INSERT [dbo].[merchant_t] ([id], [merchant_num], [merchant_nm], [status]) VALUES (2441, N'SS64008239', N'Kiwi Petz', N'T')
GO
SET IDENTITY_INSERT [dbo].[merchant_t] OFF
GO
SET IDENTITY_INSERT [dbo].[transaction_t] ON 
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215957, 389, CAST(N'2020-04-01T06:55:37.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215958, 389, CAST(N'2020-04-01T06:56:13.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215959, 389, CAST(N'2020-04-01T07:06:47.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215960, 389, CAST(N'2020-04-01T07:09:09.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215961, 389, CAST(N'2020-04-01T08:19:28.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215962, 389, CAST(N'2020-04-01T08:19:45.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215963, 389, CAST(N'2020-04-01T08:19:57.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215964, 389, CAST(N'2020-04-01T08:20:16.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215965, 389, CAST(N'2020-04-01T09:33:40.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215966, 389, CAST(N'2020-04-01T09:33:46.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215967, 389, CAST(N'2020-04-01T11:05:35.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215968, 389, CAST(N'2020-04-01T11:06:13.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215969, 389, CAST(N'2020-04-01T11:12:51.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215970, 389, CAST(N'2020-04-01T11:17:38.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215971, 389, CAST(N'2020-04-01T12:45:13.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215972, 389, CAST(N'2020-04-01T12:45:49.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215973, 389, CAST(N'2020-04-01T12:50:17.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18201776, 1580, CAST(N'2020-04-01T14:16:38.000' AS DateTime), 0, N'ANZ')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215974, 389, CAST(N'2020-04-01T12:59:06.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215975, 389, CAST(N'2020-04-01T14:18:13.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215976, 389, CAST(N'2020-04-01T14:18:21.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215977, 389, CAST(N'2020-04-01T14:26:15.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215978, 389, CAST(N'2020-04-01T14:33:02.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215979, 389, CAST(N'2020-04-01T15:12:43.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215980, 389, CAST(N'2020-04-01T15:12:46.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215981, 389, CAST(N'2020-04-01T15:41:14.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18215982, 389, CAST(N'2020-04-01T15:43:53.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18223719, 2441, CAST(N'2020-04-02T10:20:03.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231457, 389, CAST(N'2020-04-02T07:11:32.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231458, 389, CAST(N'2020-04-02T07:13:06.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231459, 389, CAST(N'2020-04-02T08:49:33.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231460, 389, CAST(N'2020-04-02T09:03:17.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231461, 389, CAST(N'2020-04-02T09:08:39.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231462, 389, CAST(N'2020-04-02T11:17:25.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231463, 389, CAST(N'2020-04-02T11:20:53.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231464, 389, CAST(N'2020-04-02T11:22:38.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231465, 389, CAST(N'2020-04-02T12:03:20.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231466, 389, CAST(N'2020-04-02T12:03:58.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231467, 389, CAST(N'2020-04-02T12:20:27.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18231468, 389, CAST(N'2020-04-02T12:37:47.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18233881, 341, CAST(N'2020-04-02T13:54:30.000' AS DateTime), 0, N'ANZ')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18237848, 2011, CAST(N'2020-04-03T13:31:36.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18245130, 389, CAST(N'2020-04-03T06:24:34.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18245131, 389, CAST(N'2020-04-03T06:26:47.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18245132, 389, CAST(N'2020-04-03T10:55:09.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18245133, 389, CAST(N'2020-04-03T11:00:42.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18245134, 389, CAST(N'2020-04-03T11:00:57.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18245135, 389, CAST(N'2020-04-03T14:01:58.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18245136, 389, CAST(N'2020-04-03T14:08:48.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18245137, 389, CAST(N'2020-04-03T14:41:57.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18245138, 389, CAST(N'2020-04-03T14:44:40.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18250367, 2441, CAST(N'2020-04-04T09:53:19.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18274546, 2274, CAST(N'2020-04-06T02:50:01.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18274563, 1714, CAST(N'2020-04-06T06:36:45.000' AS DateTime), 1, N'ANZ')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277532, 389, CAST(N'2020-04-06T06:37:57.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277533, 389, CAST(N'2020-04-06T06:46:45.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277534, 389, CAST(N'2020-04-06T07:36:29.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277535, 389, CAST(N'2020-04-06T07:43:00.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277536, 389, CAST(N'2020-04-06T08:11:22.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277537, 389, CAST(N'2020-04-06T08:12:13.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18279767, 341, CAST(N'2020-04-06T09:40:12.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18268407, 1033, CAST(N'2020-04-06T20:15:56.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18270264, 544, CAST(N'2020-04-06T12:36:12.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18270265, 544, CAST(N'2020-04-06T12:37:11.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18270266, 544, CAST(N'2020-04-06T12:38:08.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18270267, 544, CAST(N'2020-04-06T12:40:05.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18270268, 544, CAST(N'2020-04-06T12:42:50.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18270269, 544, CAST(N'2020-04-06T12:43:36.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18270270, 544, CAST(N'2020-04-06T12:45:09.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18270271, 544, CAST(N'2020-04-06T12:45:26.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18273055, 538, CAST(N'2020-04-06T12:34:02.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277538, 389, CAST(N'2020-04-06T11:30:42.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277539, 389, CAST(N'2020-04-06T11:31:06.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277540, 389, CAST(N'2020-04-06T11:44:56.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277541, 389, CAST(N'2020-04-06T11:49:43.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277542, 389, CAST(N'2020-04-06T13:07:37.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277543, 389, CAST(N'2020-04-06T13:08:50.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277544, 389, CAST(N'2020-04-06T14:10:14.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277545, 389, CAST(N'2020-04-06T14:10:37.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277546, 389, CAST(N'2020-04-06T14:10:54.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277547, 389, CAST(N'2020-04-06T14:14:26.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277548, 389, CAST(N'2020-04-06T15:25:39.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277549, 389, CAST(N'2020-04-06T15:26:02.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277550, 389, CAST(N'2020-04-06T18:31:10.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277551, 389, CAST(N'2020-04-06T18:33:05.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277552, 389, CAST(N'2020-04-06T18:44:58.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277553, 389, CAST(N'2020-04-06T19:46:52.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277554, 389, CAST(N'2020-04-06T20:31:33.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277555, 389, CAST(N'2020-04-06T20:43:56.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277556, 389, CAST(N'2020-04-06T20:51:42.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18277557, 389, CAST(N'2020-04-06T21:03:01.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18293723, 389, CAST(N'2020-04-07T06:20:15.000' AS DateTime), 0, N'IBA')
GO
INSERT [dbo].[transaction_t] ([id], [merchant_id], [transaction_dt], [trans_live], [bank_txt]) VALUES (18293724, 389, CAST(N'2020-04-07T06:45:17.000' AS DateTime), 0, N'IBA')
GO

LATEST VERSION RESULTS

enter image description here

INDEXES

merchant_t

/****** Object:  Index [PK_merchant_t]    Script Date: 6/13/2020 8:40:44 PM ******/
ALTER TABLE [dbo].[merchant_t] ADD  CONSTRAINT [PK_merchant_t] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

/****** Object:  Index [IX_status]    Script Date: 6/13/2020 8:44:08 PM ******/
CREATE NONCLUSTERED INDEX [IX_status] ON [dbo].[merchant_t]
(
    [status] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

transaction_t

/****** Object:  Index [IX_merchant_id]    Script Date: 6/13/2020 8:48:08 PM ******/
CREATE NONCLUSTERED INDEX [IX_merchant_id] ON [dbo].[transaction_t]
(
    [merchant_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


/****** Object:  Index [IX_transaction_dt]    Script Date: 6/13/2020 8:45:59 PM ******/
CREATE NONCLUSTERED INDEX [IX_transaction_dt] ON [dbo].[transaction_t]
(
    [transaction_dt] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

/****** Object:  Index [IX_trans_live]    Script Date: 6/13/2020 8:46:57 PM ******/
CREATE NONCLUSTERED INDEX [IX_trans_live] ON [dbo].[transaction_t]
(
    [trans_live] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

/****** Object:  Index [IX_merchant_id]    Script Date: 6/13/2020 8:47:17 PM ******/
CREATE NONCLUSTERED INDEX [IX_merchant_id] ON [dbo].[transaction_t]
(
    [merchant_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

/****** Object:  Index [IX_bank_txt]    Script Date: 6/13/2020 8:47:37 PM ******/
CREATE NONCLUSTERED INDEX [IX_bank_txt] ON [dbo].[transaction_t]
(
    [bank_txt] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
user460114
  • 1,848
  • 3
  • 31
  • 54
  • The result being duplicate is the Last_ibank_Transaction field as in the shown record. Just wanted to understand do you want to show MAX last_bank_Transaction for a merchant. – Rajat Jaiswal May 29 '20 at 05:59
  • Yes, I do, along with all the other additional data described in the second query. – user460114 May 29 '20 at 07:37
  • As for the additional columns: do you want the date of the last IBA transaction over the whole transactions table, or just for the period that the rest of the query is filtering on (ie April 2020 only)? Same question for the `is_live` column. – GMB Jun 02 '20 at 00:14
  • Those should be over the whole transactions table, not filtered by dates. – user460114 Jun 02 '20 at 09:12
  • I updated my query, give it another try – Wouter Jun 03 '20 at 06:13
  • @user460114, your second query returns different number of rows because it has a different `GROUP BY` clause. I've added an answer with more details. – Vladimir Baranov Jun 11 '20 at 23:19
  • My answer update after you provided script – KumarHarsh Jun 12 '20 at 07:48
  • 2
    Your original statement was "all merchants in the database who have transactions between two given dates". Because you are doing an outer join on your transactions, you are actually getting a list of all merchants whether or not they had such a transaction. So converting that to a hard JOIN would be a start. If you really want all, with just a transaction count of 0, then the outer join is OK, but the COUNT should be on a field in the transaction record, such as its ID, as COUNT of a null field would be zero. – wordragon Jun 13 '20 at 22:32
  • You get those "duplicate" rows because you group by `tt.bank_txt` but don't select it. Either you need this info then add it to the select list, or you don't then remove it from group by. Currently DISTINCT is plain wrong. – dnoeth Jun 14 '20 at 10:37

8 Answers8

2

In your first query you have this clause:

GROUP BY me.id, me.merchant_num, me.merchant_nm, me.status

In your second query you have this clause:

GROUP BY me.id, me.merchant_num, me.merchant_nm, me.status, tt.bank_txt

You are getting different number of rows in the second query because your GROUP BY is different. You need to keep it the same.


Here is one way to do it. I used CTE to make the query readable.

As you can see in the comments in the query, it returns the value of the trans_live column for the merchant's most recent transaction. Most recent transaction of the merchant across all dates, regardless of the filter in the CTE. Same with the date of the last transaction that uses IBank (IBA). It returns the last transaction of the merchant across all dates, regardless of the filter in the CTE.

It is not clear from the question if this is what you want, though. Please clarify.

WITH
CTE_Merchants
AS
(
    SELECT -- distinct
        -- you don't need distinct here, because GROUP BY does it
        -- try to comment out distinct, you should get exactly the same result
        me.id, me.merchant_num, me.merchant_nm
        ,count(tt.id) as num_transactions
        -- return 0 count for merchants that don't have any transactions
        -- within the given range of dates

        ,CASE WHEN me.status = 'A' THEN 'Yes' ELSE 'No' END as production_mode
    FROM
        merchant_t me
        LEFT OUTER JOIN transaction_t tt
            ON  tt.merchant_id = me.id 
            AND tt.transaction_dt BETWEEN '2020-04-01' and '2020-04-30'
    WHERE me.status = 'T'
    GROUP BY me.id, me.merchant_num, me.merchant_nm, me.status
)
SELECT
    CTE_Merchants.id
    ,CTE_Merchants.merchant_num
    ,CTE_Merchants.merchant_nm
    ,CTE_Merchants.num_transactions
    ,CTE_Merchants.production_mode
    ,A1.is_live
    ,A2.last_IBA_transaction_dt
FROM
    CTE_Merchants
    OUTER APPLY
    (
        -- use OUTER APPLY (not CROSS APPLY) in case there are merchants with no transactions
        -- The value of the "trans_live" column for the merchant's most recent transaction.
        -- Most recent across all dates, regardless of the filter in the CTE
        select top 1
            transaction_t.trans_live AS is_live
        from transaction_t
        where
            transaction_t.merchant_id = CTE_Merchants.id
        order by transaction_dt desc
    ) AS A1
    OUTER APPLY
    (
        -- use OUTER APPLY (not CROSS APPLY) in case there are merchants with no transactions
        -- The date of the last transaction that uses IBank (IBA).
        -- The last transaction across all dates, regardless of the filter in the CTE
        select top 1
            transaction_t.transaction_dt AS last_IBA_transaction_dt
        from transaction_t
        where
            transaction_t.merchant_id = CTE_Merchants.id
            and transaction_t.bank_txt = 'IBA'
        order by transaction_dt desc
    ) AS A2
;

For this query to work efficiently, I would recommend to create certain indexes. You already have an index IX_status on the merchant_t table. This is good.

Those indexes that you have on the transaction_t table are not the best for this kind of query. You existing indexes IX_transaction_dt, IX_trans_live, IX_bank_txt are not useful for this query. Even IX_merchant_id by itself is not really useful, especially if you replace it with a composite index on (merchant_id, transaction_dt), as I've shown below.

I hope you do have a clustered primary key on id in transaction_t table. Similar to the primary key in the merchant_t table. If not, I'd create it.

Then, for efficient join between merchant_t and transaction_t tables and for efficient retrieval of the latest trans_live we need the following index:

CREATE NONCLUSTERED INDEX [IX_merchant_id_transaction_dt] ON [dbo].[transaction_t]
(
    merchant_id,
    transaction_dt -- you can put DESC here, but it should not matter
) INCLUDE (trans_live)

The order of columns in this index is important.

You have an index on just transaction_dt. This query might use it, but it would be not as efficient as on (merchant_id, transaction_dt). Your current index is useful if you have queries that filter by transaction date without looking at the merchant_id. If you don't have such queries, you'd better drop it. Having too many indexes "just in case" may be a problem for optimizer, and it slows down updates and inserts.

For efficient retrieval of the last last_IBA_transaction_dt we'll need this index:

CREATE NONCLUSTERED INDEX [IX_merchant_id_bank_txt_transaction_dt] ON [dbo].[transaction_t]
(
    merchant_id,
    bank_txt,
    transaction_dt -- you can put DESC here, but it should not matter
)

Again, the order of columns in this index is important.


If you don't want to create two indexes specifically for this query, you can try just one index, which is great for getting trans_live and should somewhat help with last_IBA_transaction_dt.

CREATE NONCLUSTERED INDEX [IX3] ON [dbo].[transaction_t]
(
    merchant_id,
    transaction_dt DESC
) INCLUDE (trans_live, bank_txt)

You can try and measure performance with the first two indexes and then just with the third one and compare.


By the way, if there are merchants that don't have any transactions within the given range of dates, then the query will return a row for these merchants. The COUNT in your original query will return 1 for these merchants. Probably this is not what you want.

To return 0 count for these merchants, the COUNT function should be COUNT(tt.id). I have made these changes in the code above.

Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • 1
    Vladimir, this does work thanks. And the correct number of rows seems to be returning. The only problem is that the date range is inside a subquery. As I mentioned in my original post, I need to use the query in an excel report, where I will parametize the date range. However, excel doesn't seem to allow the creation of parameters when the variables are inside a subquery. – user460114 Jun 12 '20 at 09:27
  • @user460114, I think it is better to ask another question specifically about Excel and passing parameters into an SQL query (with CTE). I don't know anything about Excel, but it should not matter for Excel where these parameters are inside the query. Also, if everything else fails, you can always wrap your query into a stored procedure with parameters or into a table-valued function with parameters and call those from Excel. BTW, which of the two variants of the query logic you actually needed? – Vladimir Baranov Jun 12 '20 at 12:05
  • Yes, you are right. I will do that. Your first option is the one I need. Can you suggest any way to make your query run faster? It takes 90 seconds to fully load. Praful's answer runs in under 2 seconds, though his answer needs to be tweaked. – user460114 Jun 13 '20 at 05:17
  • @user460114, well, it doesn't matter how fast the query runs if it returns wrong results, right? The optimal query depends on how your data distribution looks like. What indexes do you have already? Please update the question with `CREATE INDEX` statements. Can you create new indexes? How many rows are in the `merchants` table? How many rows in the `merchants` table have `status = 'T'`? How many rows are in the `transactions` table? How many transactions per merchant in total? How many transactions per merchant within the typical range of dates (a month)? – Vladimir Baranov Jun 13 '20 at 05:45
  • 1
    Vladimir, I've updated the question with "create index" scripts. Yes, I can create new indexes if necessary. Approximately 3500 merchant records in total. Approximately 19 million historic records in transactions table. So that gives you an idea of the average total per merchant over time. Only 683 merchants have status = 'T'. Per month, transactions per merchant range up to max 40,000. – user460114 Jun 13 '20 at 09:07
  • 1
    @user460114, thank you for this extra info. I have added my recommendations for indexes to the answer. – Vladimir Baranov Jun 13 '20 at 10:30
  • 1
    Nice, it ran in two seconds with those two indexes. You're a magician. – user460114 Jun 15 '20 at 06:57
  • @VladimirBaranov, Even if transaction_t 's id column is not directly not involve in this query, it should be primary key.Then again querying transaction_t tt on multiple occasion, with so many column included in Group BY.Also So many Non clustered index in transaction_t. not require. I didn't recommened index because I wanted to be sure about given table structure whether it is real or not. RESPECT – KumarHarsh Jun 15 '20 at 14:22
  • Vladimir, I've created a new bounty question based on dnoeth's answer, not because it's better than yours, but because it's nearly ready for Excel except for a single line. Feel free to contribute if you want. https://stackoverflow.com/questions/62383674/parametizing-query-for-excel – user460114 Jun 20 '20 at 01:18
2

This tries to get rid of the two Scalar Subqueries/Outer Apply. It runs a single aggregation on the transaction table using a conditional aggregate and the old piggyback trick to get the latest trans_live.

SELECT
   me.id
  ,me.merchant_num
  ,me.merchant_nm
   -- return 0 count for merchants that don't have any transactions
   -- within the given range of dates
  ,Count(CASE WHEN tt.transaction_dt BETWEEN '2020-04-01' AND '2020-04-30' THEN tt.id end) AS num_transactions
   -- the date of the last transaction that uses IBank (IBA)
  ,CASE WHEN me.status = 'A' THEN 'Yes' ELSE 'No' END AS production_mode   
  ,Max(CASE WHEN bank_txt = 'IBA' THEN tt.transaction_dt END) AS last_IBA_transaction_dt
      -- the value of the "trans_live" column for the merchant's most recent transaction
      -- piggybacks the trans_live on the max transaction_dt
      -- similar to FIRST_VALUE
     ,convert(bit, Substring(Max(convert(CHAR(8), tt.transaction_dt, 112) + convert(CHAR(1), trans_live)), 9, 1)) AS is_live
FROM merchant_t AS me
LEFT JOIN transaction_t AS tt
  ON tt.merchant_id = me.id 
WHERE me.status = 'T' 
GROUP BY 
   me.id
  ,me.merchant_num
  ,me.merchant_nm
  ,me.status

Don't ask me about adding indexes for improving performance, I don't work with SQL Server anymore and my main DBMS Teradata usually runs fast without adding indexes for specific queries :-)

dnoeth
  • 59,503
  • 4
  • 39
  • 56
  • Clever trick with `trans_live`. Performance is just a matter of what is faster: read all 19M transactions once or seek index in that table ~1400 times. – Vladimir Baranov Jun 15 '20 at 23:54
  • @VladimirBaranov: In Teradata it's processing all 19M rows :-) – dnoeth Jun 16 '20 at 06:52
  • @dnoeth, thanks for this. This works well too. I've upvoted your answer. This answer is actually more usable for Excel. It nearly works except for one line. I've created a new question explaining it. Feel free to contribute if you can. https://stackoverflow.com/questions/62383674/parametizing-query-for-excel – user460114 Jun 20 '20 at 01:15
0

You don't need the DISTINCT, that's already taken care of by the GROUP BY.

Problem is that the columns that you GROUP BY are different than the columns in your SELECT.

You could use a subquery like this:

WITH cte_merc
as
(
    SELECT merc.id,merc.status,merc.merchant_nm,  merc.merchant_num,   
        (   select max(transaction_dt) 
           from transaction_t 
           where merchant_id = merc.id 
           and bank_txt = 'IBA') as last_ibank_transaction

       -- the value of the "trans_live" column for the merchant's most recent transaction
       ,(   select top 1 trans_live          
           from transaction_t
           where merchant_id = merc.id 
           order by transaction_dt desc) is_live

    FROM merchant_t merc
)

SELECT me.id, me.merchant_num, me.merchant_nm, count(1) as num_transactions
       ,me.is_live ,me.last_ibank_transaction 
FROM cte_merc me
INNER JOIN transaction_t tt
    ON tt.merchant_id = me.id 
WHERE me.status = 'T'
AND tt.transaction_dt BETWEEN '2020-04-01' and '2020-04-30'
GROUP BY me.id, me.merchant_num, me.merchant_nm, me.status    ,me.is_live ,me.last_ibank_transaction  
Wouter
  • 2,881
  • 2
  • 9
  • 22
  • Thanks for that. I probably should have said this in the original post, but I am trying to avoid having the dates inside a subquery because this query will be used in an excel report and I need to parametise the transaction dates. Excel doesn't seem to like it when the parametised columns are buried inside a subquery. – user460114 May 29 '20 at 07:41
  • In that case you can replace the subquery with a case: max(CASE WHEN tt.bank_txt = 'IBA' THEN transaction_dt END) – Wouter May 29 '20 at 08:29
  • I tested your query. The subquery needs a group by clause: GROUP BY me.id, me.merchant_nm, me.merchant_num, me.status, tt.bank_txt. And I'm still getting duplicates unfortunately. – user460114 May 29 '20 at 09:06
0

I think this should help. With the left outer join, Your query suggested that you need merchants irrespective of if they made a transaction or not between a period. It can be useful in cases where you want to see or compare the merchants that made transactions in the last month but no transactions in current month and might need business attention.

If you only want the merchants with transactions date between the period then yes INNER JOIN (JOIN) is what is needed.

declare @trans_start_dt date = '2020-04-01'
       ,@trans_end_dt date = '2020-04-30'

;WITH trans_tt 
 AS (Select merchant_id,
       COUNT(*) as num_transactions,
       MAX(CASE WHEN tt.bank_txt = 'IBA' 
                THEN transaction_dt
                ELSE NULL END) as [last_ibank_transaction]
FROM transaction_t
WHERE transaction_dt BETWEEN @trans_start_dt and @trans_end_dt
GROUP BY merchant_id
)

SELECT DISTINCT  
       me.id, 
       me.merchant_num, 
       me.merchant_nm, 
       tt.num_transactions as num_transactions
   ,[production_mode] = CASE WHEN me.status = 'A' 
                             THEN 'Yes' 
                             ELSE 'No' 
                             END   

 -- the date of the last transaction that uses IBank (IBA)
  ,tt.[last_ibank_transaction]
 -- the value of the "trans_live" column for the merchant's most recent transaction
  ,[is_live] = (select top 1 trans_live          
                from transaction_t
                where merchant_id = me.id 
                order by transaction_dt desc) 
 FROM merchant_t me
 LEFT OUTER JOIN trans_tt tt ON tt.merchant_id = me.id
 WHERE me.status = 'T'

Updated Code* - Also have you considered try creating a view and save summarized transactions in that for each month in DB and then you can use the excel and pass the month as a parameter. That way you don't have to work around excel.

declare @trans_month date = '2020-04-01'
       ,@trans_end_dt date = '2020-04-30'

;WITH trans_tt 
 AS (Select merchant_id,
        transaction_dt,
        COUNT(*) as num_transactions
     FROM transaction_t
     GROUP BY merchant_id,transaction_dt
    )

 SELECT DISTINCT  
   me.id, 
   me.merchant_num, 
   me.merchant_nm, 
   SUM(tt.num_transactions) as num_transactions
  ,[production_mode] = CASE WHEN me.status = 'A' 
                         THEN 'Yes' 
                         ELSE 'No' 
                         END   

 -- the date of the last transaction that uses IBank (IBA)
 [last_ibank_transaction] = (Select MAX(CASE WHEN bank_txt = 'IBA' 
                                             THEN transaction_dt
                                             ELSE NULL END)

                             FROM transaction_t 
                             WHERE merchant_id = me.merchant_id 
                             AND transaction_dt = tt.transaction_dt) --Remove this filter if the last_ibank_transaction can be outside the filtered month range
 -- the value of the "trans_live" column for the merchant's most recent transaction
,[is_live] = (select top 1 trans_live          
              from transaction_t
              where merchant_id = me.id 
              order by transaction_dt desc)
FROM merchant_t me
JOIN trans_tt tt ON tt.merchant_id = me.id
WHERE me.status = 'T'
AND tt.transaction_dt BETWEEN @trans_start_dt and @trans_end_dt
GROUP BY me.id, 
     me.merchant_num, 
     me.merchant_nm, 
    CASE WHEN me.status = 'A' 
         THEN 'Yes' 
         ELSE 'No' 
         END

Try this and its very close to what you had originally

SELECT distinct me.id, me.merchant_num, me.merchant_nm, count(1) as num_transactions
   ,[production_mode] = CASE WHEN me.status = 'A' 
                             THEN 'Yes' 
                             ELSE 'No' END    

   -- additional information needed as below, with comments -- the date of the last transaction that uses IBank (IBA)
   ,[last_ibank_transaction] = (SELECT max(transaction_dt) 
                                FROM transaction_t 
                                WHERE merchant_id = me.id 
                                AND bank_txt = 'IBA')

   -- the value of the "trans_live" column for the merchant's most recent transaction
   ,[is_live] =  (SELECT top 1 trans_live          
                  FROM transaction_t
                  WHERE merchant_id = me.id ORDER BY transaction_dt DESC) 
FROM merchant_t me
JOIN transaction_t tt  ON tt.merchant_id = me.id 
WHERE me.status = 'T'
AND tt.transaction_dt BETWEEN '2020-04-01' and '2020-04-30'
GROUP BY me.id, me.merchant_num, me.merchant_nm, me.status, tt.bank_txt

enter image description here

VTi
  • 1,309
  • 6
  • 14
  • Thanks Varun. I get an error "Msg 319, Level 15, State 1, Line 4 Incorrect syntax near the keyword 'with'." – user460114 Jun 03 '20 at 04:20
  • Oops. I missed the ; near the WITH .The DDLs of the tables in thew question helps in these cases. Anyways I have edited my answer now. Check if it helps. – VTi Jun 03 '20 at 13:36
  • This looks to be returning correct data. However, as I mentioned in my original post, I need to have the date range (the BETWEEN clause) separated out of the subquery, like Pratyush Srivastava's answer. That's because this query will be used in an excel report and the dates will be turned into excel parameters. Excel doesn't seem to allow parameters when the variables are buried in a subquery. – user460114 Jun 04 '20 at 04:29
  • Check my edited answer. Also have you considered try creating a view and save summarized transactions in that for each month in DB and then you can use the excel and pass the month as a parameter. That way you don't have to work around excel. – VTi Jun 04 '20 at 19:03
  • Hi Varun, Unfortunately, this returns many duplicates. See my edited original post, at the bottom for a screenshot. By the way, I corrected an error: I removed the "tt" alias from MAX(CASE WHEN tt.bank_txt = 'IBA' . – user460114 Jun 05 '20 at 10:21
  • Try my edited code now. Also, please consider sharing the DDLs of tables and some sample data in text format so that I can try to replicate. – VTi Jun 05 '20 at 12:52
  • Varun, still duplicates after I corrected errors. how can I send the sample data file? By email? – user460114 Jun 07 '20 at 09:05
  • You can share it as formatted text in your question itself and you can also share the DDLs. Check this https://meta.stackoverflow.com/questions/333952/why-should-i-provide-a-minimal-reproducible-example-for-a-very-simple-sql-query – VTi Jun 07 '20 at 11:38
  • Varun, I've added the SQL generation text to my original post. I've got 11 hours to award this bounty, apparently. – user460114 Jun 09 '20 at 08:29
  • Thanks. None of the merchant_ids match between the 2 tables. – VTi Jun 09 '20 at 15:12
  • Sorry, have fixed that now. – user460114 Jun 10 '20 at 05:30
  • Check my updated answer. Its very very close to what you already had except the referencing the table tt in last_ibank_transaction and replacing left outer to inner join. Let me know if it helps. The sample data is so helpful. – VTi Jun 10 '20 at 11:24
  • Thanks Varun, there are a number of errors with the query. Did you test it? I tried to fix them, but ended up with duplicates as before. There was a missing "group by" which I may not have added in the right place. – user460114 Jun 10 '20 at 19:33
  • I have updated my post, at the bottom, with my corrected query and the results I got. It seems like all the transactions are listed separately when they should be combined under one merchant record. Also, I want to stay with LEFT OUTER JOIN for now. Thanks so much for your patience. We're nearly there I think. – user460114 Jun 10 '20 at 19:44
  • You are not using the latest query that I have in my answer. scroll to the bottom. Also check my screenshot of results – VTi Jun 10 '20 at 19:48
  • OK, I got it. There are still dupes though. Please see updated post. – user460114 Jun 10 '20 at 19:54
  • The tables I provided do not have all the columns. They are large tables. I only provided enough for the example to hopefully work. Do you need the entire tables? Maybe there is some other field that is causing the duplication. – user460114 Jun 10 '20 at 19:57
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/215698/discussion-between-varunt-and-user460114). – VTi Jun 10 '20 at 20:02
0

You are already having a distinct set. So, keep that aside and use that distinct set with OUTER APPLY and get the results as given below:

As @Wouter suggested, you don't need DISTINCT again, as you are having GROUP BY.

SELECT torig.id
    torig.merchant_num,
        torig.merchant_nm, 
        torig.num_transactions, 
        torig.production_mode
    ,t1.max_transaction_dt
       ,t2.trans_live
FROM
(SELECT me.id, me.merchant_num, me.merchant_nm, count(1) as num_transactions
       ,CASE WHEN me.status = 'A' THEN 'Yes' ELSE 'No' END as production_mode

FROM merchant_t me
LEFT OUTER JOIN transaction_t tt
    ON tt.merchant_id = me.id 
    AND tt.transaction_dt BETWEEN '2020-04-01' and '2020-04-30'
WHERE me.status = 'T'
GROUP BY me.id, me.merchant_num, me.merchant_nm, me.status) as torig(id, merchant_num,merchant_nm, num_transactions, production_mode)

OUTER APPLY (
select max(transaction_dt)
           from transaction_t as tt where merchant_id = torig.id 
           and tt.bank_txt = 'IBA'
) as t1(max_transaction_dt)
OUTER APPLY
(
select top 1 trans_live          
from transaction_t
where merchant_id = torig.id order by transaction_dt desc
) as t2(trans_live)
Venkataraman R
  • 12,181
  • 2
  • 31
  • 58
  • Thanks but there are errors all over the place with this query. See screenshot at the end of the updated original post. – user460114 Jun 02 '20 at 09:24
0

The first query is Wrong: When you are aggregating, you can not leave out a dimension. if you are selecting it, you have to group by it also.

If you are sure you need just 1 record per merchant (612 records) consider moving production_mode to 2 different columns

Here is an optimized query:

select me.id, me.merchant_num, me.merchant_nm
        , count(*) as num_transactions
        ,sum(CASE WHEN me.status = 'A' THEN 1 ELSE 0 END) as production_mode
        ,sum(CASE WHEN me.status <> 'A' THEN 1 ELSE 0 END) as not_production
        ,max(transaction_dt) as last_ibank_transaction
        ,max(case when trans_rnk = 1 then trans_live else null end) as trans_live
       from 
        (
            select * , rank() over (partition by merchant_id order by transaction_dt desc) as trans_rnk 
            from transaction_t
        )aa 
        left join merchant_t bb on aa.merchant_id=bb.id
        where bank_txt = 'IBA' and transaction_dt BETWEEN '2020-04-01' and '2020-04-30'
       GROUP BY me.id, me.merchant_num, me.merchant_nm
  • After correcting the alias from "me" to "bb", I get the error "Msg 8117, Level 16, State 1, Line 6. Operand data type bit is invalid for max operator." – user460114 Jun 02 '20 at 19:17
  • And please see the Edit in my original post for further details.... – user460114 Jun 02 '20 at 19:37
  • Pratyush, are you still interested in this? I've been talking with Varun about his solution, but he's still getting duplicates. Your query is calculating the number of transactions correctly without duplicating merchants as in Varun's example. However, we need to return a record for every merchant in the database. We are so close! – user460114 Jun 11 '20 at 05:05
0
SELECT me.id, me.merchant_num, me.merchant_nm, sum(case when tt.merchant_id is null then 0 else 1 end) as num_transactions ,CASE WHEN me.status = 'A' THEN 'Yes' ELSE 'No' END as production_mode
                   -- additional information needed as below, with comments

                   -- the date of the last transaction that uses IBank (IBA)
                   ,A2.last_IBA_transaction_dt as last_ibank_transaction

                   -- the value of the "trans_live" column for the merchant's most recent transaction
                   ,A1.is_live is_live

                FROM merchant_t me
                left outer JOIN
                (
                select 
                transaction_dt,bank_txt,merchant_id,trans_live from transaction_t 
                --where transaction_dt BETWEEN '2020-04-01' and '2020-04-30'
                group by transaction_dt,bank_txt,merchant_id,trans_live
                )as tt
                     ON tt.merchant_id = me.id and tt.bank_txt = 'IBA' and tt.transaction_dt BETWEEN '2020-04-01' and '2020-04-30'
                     OUTER APPLY
        (
            -- use OUTER APPLY (not CROSS APPLY) in case there are merchants with no transactions
            -- The value of the "trans_live" column for the merchant's most recent transaction.
            -- Most recent across all dates, regardless of the filter in the CTE
            select top 1
                transaction_t.trans_live AS is_live
            from transaction_t
            where
                transaction_t.merchant_id = me.id  and transaction_t.bank_txt = 'IBA'
            order by transaction_dt desc
        ) AS A1
        OUTER APPLY
        (
            -- use OUTER APPLY (not CROSS APPLY) in case there are merchants with no transactions
            -- The date of the last transaction that uses IBank (IBA).
            -- The last transaction across all dates, regardless of the filter in the CTE
            select top 1
                transaction_t.transaction_dt AS last_IBA_transaction_dt
            from transaction_t
            where
                transaction_t.merchant_id = me.id
                and transaction_t.bank_txt = 'IBA'
            order by transaction_dt desc
        ) AS A2

                WHERE me.status = 'T' 

                GROUP BY me.id, me.merchant_num, me.merchant_nm, me.status, tt.bank_txt,A1.is_live,A2.last_IBA_transaction_dt
  • Praful, this query returns num_transactions = 1 for every merchant. But, I know for a fact that many of the merchants have more than one transaction. – user460114 Jun 04 '20 at 04:18
  • its depends on your business requirement, in above query i only pick latest transaction done based on merchantid and banktxt, if you want all transaction to same merchant id then just comment tt.rn=1 – Praful Bangal Jun 04 '20 at 09:45
  • kindly check my updated answer also check how many records return my left join inner query if it matches to your count then you just have to manipulate the query accordingly – Praful Bangal Jun 05 '20 at 19:59
  • This is fast and returns correct number of rows. However, as I mentioned in my original post, I need to use the query in an Excel report and the date range needs to be parametized, i.e. "WHERE transaction_dt BETWEEN ? and ?". It seems that Excel does not let you parametize columns that are inside subqueries. Unless someone can show me how that's done. – user460114 Jun 13 '20 at 04:50
  • 1
    Praful, I have just noticed some errors with your answer. The last_ibank_transaction column does not have correct data. Note that this column should not depend on the date range. The "is_live" column should also not depend on the date range. These two columns should be taken across the entire transactions table, regardless of the date of the transaction. – user460114 Jun 13 '20 at 05:13
  • @user460114 I got your approach, I have updated query accordingly now it will return the latest transaction date and is_live status irrespective of the transaction date filter. – Praful Bangal Jun 13 '20 at 20:06
0

Main problem : using DISTINCT which is performance degrading.At the same time it is not solving your problem i.e. to give unique merchant records.

I tried to remove Distinct at the cost of little lenghty script.But I am sure it solve the problem including Performance.

Variable or paramter should be of same data type as that of column data type.Data type mismatch is another performance degradation thing.

declare @start datetime='2020-04-01'
declare @end datetime='2020-04-30'
;

WITH CTE
AS (
    SELECT me.id
        ,me.merchant_num
        ,me.merchant_nm
        --, count(1) as num_transactions
        ,CASE 
            WHEN me.STATUS = 'A'
                THEN 'Yes'
            ELSE 'No'
            END AS production_mode
    FROM merchant_t me
    WHERE EXISTS (
            SELECT 1
            FROM transaction_t tt
            WHERE tt.merchant_id = me.id
                AND tt.transaction_dt >= @start
                    AND tt.transaction_dt <=@end
                AND me.STATUS = 'T'
            )
    )
    ,CTE1
AS (
    SELECT merchant_id
        ,max(transaction_dt) AS last_ibank_transaction
        ,max(0+trans_live) AS is_live
        ,count(1) AS num_transactions -- hope this correct
    FROM transaction_t tt
    INNER JOIN CTE me ON merchant_id = me.id
        AND tt.bank_txt = 'IBA'
    GROUP BY merchant_id
    )
SELECT me.id
    ,me.merchant_num
    ,me.merchant_nm
    ,num_transactions
    ,production_mode
    ,last_ibank_transaction
    ,is_live
FROM CTE me
INNER JOIN CTE1 tt ON tt.merchant_id = me.id

UNION ALL

SELECT me.id
    ,me.merchant_num
    ,me.merchant_nm
    ,NULL AS num_transactions
    ,CASE 
        WHEN me.STATUS = 'A'
            THEN 'Yes'
        ELSE 'No'
        END AS production_mode
    ,NULL AS last_ibank_transaction
    ,NULL is_live
FROM merchant_t me
WHERE NOT EXISTS (
        SELECT 1
        FROM transaction_t tt
        WHERE tt.merchant_id = me.id
            AND tt.transaction_dt >= @start
                    AND tt.transaction_dt <=@end
        and me.STATUS = 'T'
        )

Your Task : To make my query run as it is .Verify if it is giving correct output.

Lastly when everythnig is fine then we can decide whether to use CTE or #temp table depending upon number of rows involve and excecution plan.

Edit 1 : Table transaction_t is being queried 2 times so you can put them in #temp table.

I do not know if calclulation like max(transaction_dt) can be put in #Temp table.If yes then you can also put them in #temp table.

OR do the calculation from #temp table itself, is possible.

create table #temp(merchant_id int,transaction_dt datetime,trans_live bit)
insert into #temp
select merchant_id ,transaction_dt ,trans_live
            FROM transaction_t tt
            WHERE  tt.transaction_dt >= @start
                    AND tt.transaction_dt <=@end

You hv to correct the query suiting your requirement.

This way you can avoid CTE altogether.

Data Type : if possible change nvarchar,nchar to varchar and char.

If columns are not null then make it not null in design. It help in making index plan.

Index : If schema of both table are real then it should have both id as PK

[id] [int] IDENTITY(1,1) primary key in both table.

I will post about Non clustered index later on.

KumarHarsh
  • 5,046
  • 1
  • 18
  • 22
  • This returns all merchants in the database, but I need only test merchants in this example, i.e. where status='T'. I see you have included "status = 'T' in your query, but the results don't reflect that. Query is very fast though. Some of the other answers take several minutes. – user460114 Jun 13 '20 at 04:41