2

Recently I've been informed by a StackOverflow user that using BETWEEN operator with values of data type timestamp without time zone should not be used. Below is the quote.

Between means >= and <= and shall not be used with ranges that contain timestamps.

When asked for an explanation of this thesis or a link to Postgres documentation where it states that I've got an answer saying

Why would such a simple thing need a site with documentation. I am sure you can find many anyway if you google (at least my detailed posts on various forums demonstrating the case)

Well I googled. And found nothing that would advise against using this operator with timestamp values. In fact this answer on SO uses them and so does this mailing group post.

I was informed that all these years I was doing it wrong. Is it really the case?

As far as I know Postgres max precision for a timestamp is 1 microsecond - correct me if I'm wrong. Thus aren't below statements equivalent ?

sample_date BETWEEN x AND y::timestamp - INTERVAL '1 microsecond'

and

sample_date >= x AND sample_date < y

Edit: The sample is just a consideration of the difference. I'm aware of the fact that developers can miss the time part, but assuming one knows how it behaves, why should it not be used? Generally speaking, this is merely a sample, but I'm wondering about the bigger scope. I've been investigating the planner and it seems to be parsing BETWEEN to >= AND <=.

Why does one preferably write >= AND <= than BETWEEN in the matter of results - not including the time to translate it?

Community
  • 1
  • 1
Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72
  • The second is way preferable. It makes no assumptions about the underlying implementation (plus it is shorter and easier to read). Here is a blog post about the use of `between` in SQL Server. Many of the points also apply to Postgres: https://sqlblog.org/2011/10/19/what-do-between-and-the-devil-have-in-common. – Gordon Linoff Sep 22 '16 at 15:40
  • @GordonLinoff thanks for your input. This example is just to consider difference between both statements. Main question here asks whether or not that user was right pointing me that BETWEEN should not be used (generally) with timestamps - not the mentioned case only. I'll make sure to read that article, thanks again. – Kamil Gosciminski Sep 22 '16 at 15:42
  • 1
    You can use it, but it is clumsy and ugly, and since it uses the epsilon `t BETWEEN a AND b - epsilon`, it will depend on the epsilon. The mathematical convention with an half-open interval `t >= c1 AND t < d1` is much clearer, IMHO. Also: it is stackable, without overlap: `t2 >= c2 AND t2 = c1 AND t < d2` whenever `c2=d1` – joop Sep 22 '16 at 16:18
  • @joop thanks for your comment. But then again - this sample was just to prove the point. Generally, the last question is the matter I'm worried about, and IMHO I shouldn't be . . . – Kamil Gosciminski Sep 22 '16 at 16:22
  • @KamilG. . . . It should *not be used* because it is less clear and more prone to error. Experienced programmers try to avoid such constructs when they are trying to write maintainable code (and perhaps one day move off to a different project). – Gordon Linoff Sep 22 '16 at 23:58
  • 1
    @GordonLinoff even for implementing simple `>= and <=` case? Isn't it equal and cleaner? – Kamil Gosciminski Sep 23 '16 at 01:00

2 Answers2

1

There is absolutely nothing wrong with using ts BETWEEN validfrom AND validto instead of ts >= validform AND ts <= validto. They are the same.

I can only guess, but I'd say that the warning targets something different, namely whether either of the (identical) clauses above are the right thing to use.

Now this of course depends on what you are trying to do, but very often clauses like this are used to identify the one valid row for a particular timestamp. In that case the clause as above is wrong, because for a value of ts when the row was changed, you would get two results.

Consider this:

CREATE TABLE names (
   id integer PRIMARY KEY,
   val text NOT NULL,
   validfrom timestamptz NOT NULL,
   validto timestamptz NOT NULL
);

INSERT INTO names VALUES (1, 'Smith', '1985-05-02 00:00:00', '2009-01-30 00:00:00');
INSERT INTO names VALUES (2, 'Jones', '2009-01-30 00:00:00', 'infinity');

This is meant to be a historized table of names for a person.

If you use a WHERE clause like above to query for the name valid at a certain time, it would work well for

SELECT val FROM names
WHERE current_timestamp BETWEEN validfrom AND validto;

But it would do the wrong thing for

SELECT val FROM names
WHERE '2009-01-30' BETWEEN validfrom AND validto;

That is because the end point of the interval of validity for a name is not part of the interval. For this case, it would be correct to write:

SELECT val FROM names
WHERE '2009-01-30' >= validfrom AND '2009-01-30' < validto;
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • I'd like to point out that the question was about __generally advising against using BETWEEN with timestamps__. Your example presents something I'm well aware of. I even wrote it that the precision of Postgres timestamp is 1 microsecond. Both statements presented by you are not equal in terms of result, but in my example they are. :-) – Kamil Gosciminski Sep 23 '16 at 11:19
  • I answered your question in the first paragraph --- there is no problem whatsoever with using `BETWEEN`. The rest is just an attempt to explain why the notion might have come up that there is something wrong with `BETWEEN`. – Laurenz Albe Sep 23 '16 at 11:24
  • Thanks. If that is the case, then it seems that someone didn't actually understand me . . . – Kamil Gosciminski Sep 23 '16 at 11:28
0

I assume this is a reference the SO user would have shared: https://wiki.postgresql.org/wiki/Don't_Do_This

Don't use BETWEEN with timestamps

I used BETWEEN with two timestamps in a recent query and it worked but took much longer than expected. So, I can't verify any specific issue except, perhaps, performance; although this could be due to my giant data set.

I retried the same query with the >= timestamp1 AND < timestamp2 pattern and performance was about the same.

I suppose a real concern here might be using timestamps without timezone and being sure you're getting the right records?

James
  • 66
  • 3