4

I have a database implementation task on SQL Server, where there is a table A containing a column - yearMonth. I don't need date operations such as calculating how many days or month between two dates etc. currently. The yearMonth can be defined as Date, Int, or varchar(6). From the perspective of saving data space, 4-bytes Int is obviously the best choice as only 6-digit int is needed e.g. 201701. Whereas varchar(6) occupies 6 bytes, Date occupies 2x4 bytes. (applied for most databases)

But what about from the perspective of indexing, esp. in case of index range scan?

  • If the column yearMonth is defined as varchar(6), the index range scan can happen when using query select .. from A where yearMonth IN (...)
  • If the column yearMonth is defined as Int or Date, then index range scan can happen with operator like <=, <= etc.

In the cases above, which type of column definition is more efficient when index range scan happens?

Rui
  • 3,454
  • 6
  • 37
  • 70

3 Answers3

2

Most (if not all) DBMS essentially store a date as an integer anyway, and for DateTime it is two integers, one for the date and one for the time, so there will be little or no difference between the two. I think your biggest consideration would be how you intend to use the column, if you want to do any kind of date operations on the column, then store it as a date (defaulting to the 1st day of the month). For instance, if you want to know how many months between 201604 AND 201701 it is easier using a date, if you want to format your value as something like April 2017 it is a lot easier if it is stored as a date.

Another consideration is validation, if you have a varchar(6) or int, you would need additional check constraints to ensure that any value entered is in fact a valid date, anyone could easily enter 999999, and while the year is valid, the month is not, and for varchar the possibilities are endless for the nonsense that can be entered.


Now that you have tagged SQL Server I can answer more definitively - both DATE and INT take up 4 bytes of storage so there iS no space saved, and from testing both perform pretty much exactly the same (date performs marginally, but not significantly better and often with fewer reads), so there is no benefit to using int (unless you don't want to be restricted to just valid dates)

I have done some quick tests using the following schema:

CREATE TABLE dbo.TDate (ID INT IDENTITY(1, 1) PRIMARY KEY, DT DATE NOT NULL);
INSERT dbo.TDate (DT)
SELECT TOP 100000 DATEADD(MONTH, RAND(CHECKSUM(NEWID())) * 300, '20000101')
FROM sys.all_objects a, sys.all_objects b;

CREATE NONCLUSTERED INDEX IX_TDate_DT ON dbo.TDate (DT);

CREATE TABLE dbo.TInt(ID INT IDENTITY(1, 1) PRIMARY KEY, DT INT NOT NULL);
INSERT dbo.TInt (DT)
SELECT (DATEPART(YEAR, DT) * 100) + DATEPART(MONTH, DT)
FROM dbo.TDate;

CREATE NONCLUSTERED INDEX IX_TInt_DT ON dbo.TInt (DT);

Then running this to compare performance

DECLARE @D1 DATE = (SELECT TOP 1 DT FROM dbo.TDate ORDER BY NEWID());
DECLARE @D2 DATE = (SELECT TOP 1 DT FROM dbo.TDate WHERE DT > @D1 ORDER BY NEWID());
DECLARE @I1 INT = (DATEPART(YEAR, @D1) * 100) + DATEPART(MONTH, @D1),
        @I2 INT = (DATEPART(YEAR, @D2) * 100) + DATEPART(MONTH, @D2);


SET STATISTICS IO ON;
SET STATISTICS TIME ON;

SELECT  COUNT(*)
FROM    dbo.TDate
WHERE   DT >= @D1
AND     DT < @D2;

SELECT  COUNT(*)
FROM    dbo.TInt
WHERE   DT >= @I1
AND     DT < @I2;

SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
GarethD
  • 68,045
  • 10
  • 83
  • 123
  • I actually don't need date operations at all, so I think `Int` is enough as I only need to filter with a range. But what about the indexing? Int or Varchar(6) ? Which is more efficient in case of filtering with a range? – Rui Apr 07 '17 at 09:27
  • You can't do a range scan on varchar, so it is moot. Even if you could do like for like queries `int` will be faster than `varchar`. If you are having legitimate concerns about it, then do some tests specific to your scenario. The best way to find out which is faster is to [race your horses](https://ericlippert.com/2012/12/17/performance-rant/) – GarethD Apr 07 '17 at 09:33
0

Int should be more faster than Varchar(6), as it takes less space.

SQL SELECT speed int vs varchar
Community
  • 1
  • 1
Biswabid
  • 1,378
  • 11
  • 26
  • Thanks a lot for this hint :) The link has the tag postgresql, but the explanation kinda applies to most popular RDBMS :) – Rui Apr 07 '17 at 08:58
0

But what about from the perspective of indexing, esp. in case of index range scan?

Range scan fastness is not limited by type of index,but its effectiveness is limited by fragmentation .Your range scan query will be faster,if fragmentation is less,less fragmentation means all pages are adjacent and not scattered

TheGameiswar
  • 27,855
  • 8
  • 56
  • 94
  • Whether the pages are adjacent and not scattered seems to be unpredictable. Well, I just use the database with default configuration – Rui Apr 07 '17 at 09:09
  • it doesn't depend on database with default configuration,it depends on the key you choose ,which may case fragmentation .check out this https://www.sqlskills.com/blogs/kimberly/more-considerations-for-the-clustering-key-the-clustered-index-debate-continues/ – TheGameiswar Apr 07 '17 at 09:10
  • How do I choose the key? – Rui Apr 07 '17 at 09:18
  • check out the link,i gave – TheGameiswar Apr 07 '17 at 09:23