I have the following table data.
CREATE TABLE [dbo].[Accounts](
[AccountID] [int] IDENTITY(1,1) NOT NULL,
[StartDate] [datetime] NULL,
[EndDate] [datetime] NULL
)
INSERT INTO [dbo].[Accounts]
([StartDate]
,[EndDate])
VALUES
('01/01/2012' ,'02/01/2012'),
('01/06/2012' ,'07/01/2012'),
('01/08/2012' ,'11/01/2012'),
('01/11/2012','01/01/2013'),
('02/07/2012' ,'01/01/2013'),
('04/01/2012' ,'01/01/2013'),
('06/01/2012' ,'01/01/2013'),
('09/01/2012' ,'01/01/2013'),
('11/01/2012' ,'01/01/2013'),
('12/01/2012' ,'01/01/2014'),
('01/01/2013' ,'02/01/2014'),
('01/06/2013' ,'07/01/2014'),
('01/08/2013' ,'11/01/2014'),
('01/11/2013','01/01/2014'),
('02/07/2013' ,'01/01/2014'),
('04/01/2013' ,'01/01/2014'),
('06/01/2013' ,'01/01/2014'),
('09/01/2013' ,'01/01/2014'),
('11/01/2013' ,'01/01/2014'),
('12/01/2013' ,'01/01/2015'),
('01/01/2014' ,'02/01/2015'),
('01/06/2014' ,'07/01/2015'),
('01/08/2014' ,'11/01/2015'),
('01/11/2014','01/01/2015'),
('02/07/2014' ,'01/01/2015'),
('04/01/2014' ,'01/01/2015'),
('06/01/2014' ,'01/01/2015'),
('09/01/2014' ,'01/01/2015'),
('11/01/2014' ,'01/01/2015'),
('12/01/2014' ,'01/01/2015'),
('01/01/2014' ,'02/01/2015'),
('01/06/2014' ,'07/01/2015'),
('01/08/2014' ,'11/01/2015'),
('01/11/2014','01/01/2015'),
('02/07/2014' ,'01/01/2015'),
('04/01/2013' ,'01/01/2014'),
('06/01/2013' ,'01/01/2014'),
('09/01/2013' ,'01/01/2014'),
('11/01/2013' ,'01/01/2014'),
('12/01/2013' ,'01/01/2015')
SELECT datename(month, [StartDate])+' '+datename(year, [StartDate]) AS 'Start Date',
COUNT([AccountID]) AS 'Accounts by Month'
FROM [dbo].[Accounts]
GROUP BY datename(month, [StartDate])+' '+datename(year, [StartDate]);
I need two queries to get the running counts grouped by "Year-month". Result will have two columns "Month-Year" and "Running Count of Accounts".
- Running count of Accounts grouped by StartDate( Year,Month Level)
- Running count of Accounts taking EndDate into consideration, meaning any accounts that has EndDate past the StartDate should not be counted in the running count, again grouped by StartDate( Year,Month Level)
I would like the solution in T-SQL, HiveQL or even using PySpark DataFrames API.