41

We had this programming discussion on Freenode and this question came up when I was trying to use a VARCHAR(255) to store a Date Variable in this format: D/MM/YYYY. So the question is why is it so bad to use a VARCHAR to store a date. Here are the advantages:

  1. Its faster to code. Previously I used DATE, but date formatting was a real pain.
  2. Its more power hungry to use string than Date? Who cares, we live in the Ghz era.
  3. Its not ethically correct (lolwut?) This is what the other user told me...

So what would you prefer to use to store a date? SQL VARCHAR or SQL DATE?

Dale K
  • 25,246
  • 15
  • 42
  • 71
mahen23
  • 720
  • 2
  • 11
  • 24
  • Apparently, SQL was the first language to have temporal data types (before then everyone used text for dates and the Y2K problem never actually materialized, did it?) It would be insulting to the designers of SQL, at least one of whome is a friend of mine on Facebook, to not use them when applicable. – onedaywhen Jan 21 '11 at 15:04
  • 1
    Good question! I reviewed the answers. I've used all of the suggestions. I don't buy the "always use a date type column" answers. There are valid reasons for using VARCHAR columns. In fact you may find the best answer, if space isn't a problem to use a combination of data types to store date time values. – Dale Oct 07 '20 at 21:32

5 Answers5

55

Why not put screws in with a hammer?

Because it isn't the right tool for the job.

Some of the disadvantages of the VARCHAR version:

  • You can't easily add / subtract days to the VARCHAR version.
  • It is harder to extract just month / year.
  • There is nothing stopping you putting non-date data in the VARCHAR column in the database.
  • The VARCHAR version is culture specific.
  • You can't easily sort the dates.
  • It is difficult to change the format if you want to later.
  • It is unconventional, which will make it harder for other developers to understand.
  • In many environments, using VARCHAR will use more storage space. This may not matter for small amounts of data, but in commercial environments with millions of rows of data this might well make a big difference.

Of course, in your hobby projects you can do what you want. In a professional environment I'd insist on using the right tool for the job.

vincentvanjoe
  • 767
  • 1
  • 12
  • 23
Kramii
  • 8,379
  • 4
  • 32
  • 38
  • @Dercsár: Indeed. And there are occasions when putting dates in a VARCAR is useful, too. But it isn't generally recommended. – Kramii Jan 21 '11 at 14:27
  • 2
    @Matt: My father (who was himself from Birmingham) sometimes used the term "Brummie screwdriver" instead of "hammer". I guess the hammer-driver habit has spread south? :-) – Kramii Jan 21 '11 at 14:33
  • You're right about using the right tool for the job. Other than that, not so much - in my opinion. – Dale Oct 07 '20 at 21:41
17

When you'll have database with more than 2-3 million rows you'll know why it's better to use DATETIME than VARCHAR :)

Simple answer is that with databases - processing power isn't a problem anymore. Just the database size is because of HDD's seek time.

Basically with modern harddisks you can read about 100 records / second if they're read in random order (usually the case) so you must do everything you can to minimize DB size, because:

  • The HDD's heads won't have to "travel" this much
  • You'll fit more data in RAM

In the end it's always HDD's seek times that will kill you. Eg. some simple GROUP BY query with many rows could take a couple of hours when done on disk compared to couple of seconds when done in RAM => because of seek times.

For VARCHAR's you can't do any searches. If you hate the way how SQL deals with dates so much, just use unix timestamp in 32 bit integer field. You'll have (basically) all advantages of using SQL DATE field, you'll just have to manipulate and format dates using your choosen programming language, not SQL functions.

Slawek
  • 762
  • 4
  • 7
  • 3
    Of course, if you're storing it in a 32-bit integer field, you also need to be aware of the [Year 2038 problem](https://en.wikipedia.org/wiki/Year_2038_problem). – Powerlord Jan 29 '16 at 19:22
  • Thanks for the epoch idea, manipulating dates drives me insane :) – Andrew Schultz Jan 06 '18 at 10:20
6

Two reasons:

  • Sorting results by the dates
  • Not sensitive to date formatting changes

So let's take for instance a set of records that looks like this:

5/12/1999 | Frank N Stein
1/22/2005 | Drake U. La
10/4/1962 | Goul Friend

If we were to store the data your way, but sorted on the dates in assending order SQL will respond with the resultset that looks like this:

1/22/2005 | Drake U. La
10/4/1962 | Goul Friend
5/12/1999 | Frank N. Stein

Where if we stored the dates as a DATETIME, SQL will respond correctly ordering them like this:

10/4/1962 | Goul Friend
5/12/1999 | Frank N. Stein
1/22/2005 | Drake U. La

Additionally, if somewhere down the road you needed to display dates in a different format, for example like YYYY-MM-DD, then you would need to transform all your data or deal with mixed content. When it's stored as a SQL DATE, you are forced to make the transform in code, and very likely have one spot to change the format to display all dates--for free.

Berin Loritsch
  • 11,400
  • 4
  • 30
  • 57
  • See my answer regarding ISO 8601 below. – Nicholas Carey Jan 21 '11 at 19:42
  • This is definitely NOT a reason. With the appropriate date format, e.g. yyyy/MM/dd HH:mm:ss.SSS, the values are definitely sortable. What I really like about using strings for dates in databases is they're easy to read, regardless of the time zone you're reading them in. I just make sure I put the UTC date time in string form, and then regardless of where that database is in the world, i know what that time is. Database tools that show actual date time values (as opposed to strings that look like them) like to convert the value to your local time, which isn't always helpful. – Dale Oct 07 '20 at 21:18
  • @Dale, do not be fooled into thinking that data will always be inserted with the same date format. I've worked on many systems that have been in service for over a decade. Someone will have the bright idea to change something, and they will assume the format in the database is actually a date-time type... because that is the RIGHT thing to do. – Berin Loritsch Oct 07 '20 at 21:41
  • @Berin Thanks. :-) I concur, but adding data with a different date format would be a bug, not a bright idea, unless the time was taken to update all the existing data to the new format, and to adjust all the existing logic that depended on the old format, at the same time. – Dale Oct 08 '20 at 14:16
  • Ah, and you don't have any requirements to support users in other locales, calculate timespans between dates, or any of the other common things with dates? Like a very fundamental showing timestamps in the right time-zone for the user. If you are building a toy or a proof of concept, do what you want. But if you are building something that will be used around the globe, then use the tools that help you do that. – Berin Loritsch Oct 08 '20 at 16:02
4

Between DATE/DATETIME and VARCHAR for dates I would go with DATE/DATETIME everytime. But there is a overlooked third option. Storing it as a INTEGER unsigned!

I decided to go with INTEGER unsigned in my last project, and I am really satisfied with making that choice instead of storing it as a DATE/DATETIME. Because I was passing along dates between client and server it made the ideal type for me to use. Instead of having to store it as DATE and having to convert back every time I select, I just select it and use it however I want it. If you want to select the date as a "human-readable" date you can use the FROM_UNIXTIME() function.

Also a integer takes up 4 bytes while DATETIME takes up 8 bytes. Saving 50% storage.

The sorting problem that Berin proposes is also solved using integer as storage for dates.

Josua Pedersen
  • 620
  • 1
  • 6
  • 13
  • 1
    Please note that a datetime data type is an integer (two, actually): the leftmost is the number of days since the epoch, the rightmost is the number of millisecond ticks since start-of-day (00:00:00.000). The epoch (zero-point in calendar-speak) of SQL Server's calandar is 1 January 1900 00:00:00.000 — this is why `convert(datetime,'')` yields a datetime value of 1 January 1900. – Nicholas Carey Jan 21 '11 at 19:32
4

I'd vote for using the date/datetime types, just for the sake of simplicity/consistency.

If you do store it as a character string, store it in ISO 8601 format:

Among other things, ISO 8601 date/time string (A) collate properly, (B) are human readable, (C) are locale-indepedent, and (D) are readily convertable to other formats. To crib from the ISO blurb, ISO 8601 strings offer

representations for the following:

  • Date
  • Time of the day
  • Coordinated universal time (UTC)
  • Local time with offset to UTC
  • Date and time
  • Time intervals
  • Recurring time intervals

Representations can be in one of two formats: a basic format that has a minimal number of characters and an extended format that adds characters to enhance human readability. For example, the third of January 2003 can be represented as either 20030103 or 2003-01-03.

[and]

offer the following advantages over many of the locally used representations:

  • Easily readable and writeable by systems
  • Easily comparable and sortable
  • Language independent
  • Larger units are written in front of smaller units
  • For most representations the notation is short and of constant length

One last thing: If all you need to do is store a date, then storing it in the ISO 8601 short form YYYYMMDD in a char(8) column takes no more storage than a datetime value (and you don't need to worry about the 3 millisecond gap between the last tick of the one day and the first tick of the next. But that's a matter for another discussion. If you break it up into 3 columns — YYYY char(4), MM char(2), DD char(2) you'll use up the same amount of storage, and get more options for indexing. Even better, store the fields as a short for yyyy (4 bytes), and a tinyint for each of MM and DD — now you're down to 6 bytes for the date. The drawback, of course, to decomposing the date components into their constituent parts is that conversion to proper date/time data types is complicated.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
  • What a great answer! FYI I have chosen to use VARCHAR in database tables, more than once, for the reasons mentioned above. (I've also used date type columns, and numeric ones where I stored UNIX epoch time (?) values) It all depended on the problem. When I do use VARCHAR columns, I always store the time in UTC so there's never ever any confusion when looking at the values and its always straight forward to convert to a Date type object. – Dale Oct 07 '20 at 21:27