2

I understand that question is vendor dependent but ask should I worry if aggregate function like SUM operate on small type?

For example MariaDB uses 4 bytes for type INT. Developers may assume that each transaction have amount no bigger then a few thousands.

But what happen if we try to get income for a whole year for all departments? E.g.:

-- CREATE TABLE income (dt DATETIME, department INT, amount INT);
SELECT SUM(amount) FROM income WHERE dt BETWEEN '2014-01-01' and '2014-12-31'

It looks dumb to increasing storage size only to fix overflow issue with aggregate function SUM.

What I should worry about? Are there any guarantee or clarification from SQL 92/99/2008 standards?

Are there any special support from JDBC drivers?

Should I rewrite select in form:

SELECT SUM(CAST(amount AS BIGINT)) FROM income
  WHERE dt BETWEEN '2014-01-01' and '2014-12-31'
gavenkoa
  • 45,285
  • 19
  • 251
  • 303
  • 1
    I think you answered your own question. Yes it is vendor-specific, and casting as bigint will help. Could you overflow a bigint? theoretically. So you need to, based on the data you will be storing, could you possibly overflow that. – DBug Dec 18 '15 at 21:16
  • @a_horse_with_no_name Thanks for correction, fixed! If SQL server engine handle overflow what should I do on client side? I should select larger datatype in JDBC/ODBC client code? – gavenkoa Dec 18 '15 at 21:34
  • On the client side you have to take that into account, yes. –  Dec 18 '15 at 21:36
  • BTW: In general, for an income, I would always use `float`. It's the best versatile dataype for all kind of calculations. Also, floating-point operations are very fast, since every CPU has integrated a FPU (floating point unit). – SQL Police Dec 18 '15 at 21:46
  • Is that applicable to SQL **float** type? http://stackoverflow.com/questions/3439040/why-does-adding-two-decimals-in-javascript-produce-a-wrong-result – gavenkoa Dec 18 '15 at 21:52
  • 1
    @SQLOTL: except that `float` is an *approximate* data type and you can never be sure that you retrieve the same value that you store. It should **never** be used in any financial operatoin. http://floating-point-gui.de/ –  Dec 18 '15 at 21:53
  • 1
    @a_horse_with_no_name Yeah, I know these dicussions over `decimal` vs `float`. I also know the technical and mathematical fundamentals very well. Still, I find that ridiculous. Perhaps for banks, OK. But we have a mass data reporting system and we need **fast** real-time-processing of big sums and various calculations. Therefore I always use `float` which is much faster. Results are rounded at 2 digits. In 25 years, I **never** had a problem. Believe me, I have really experience with big companies. – SQL Police Dec 19 '15 at 14:07

3 Answers3

4

It's rather easy to test on mysql:

32bit overflow:

mysql> select sum(x) from (
    select pow(2,31) as x
    union all
    select pow(2,31)
    union all
    select pow(2,31)
) as bignums;
+------------+
| sum(x)     |
+------------+
| 6442450944 | // returned as a "bigint"
+------------+
1 row in set (0.00 sec)

64bit:

mysql> select sum(x) from (
    select pow(2,63) as x
    union all
    select pow(2,63)
    union all
    select pow(2,63)
) as bignums;
+-----------------------+
| sum(x)                |
+-----------------------+
| 2.7670116110564327e19 | // returned as float
+-----------------------+
1 row in set (0.00 sec)

Double:

mysql> select sum(x) from (
    select 1.7e+308 as x
    union all
    select 1.7e+308
    union all
    select 1.7e+308
) as bignums;
+--------+
| sum(x) |
+--------+
|      0 |
+--------+

It's rather easy to test on mysql:

32bit overflow:

mysql> select sum(x) from (
    select pow(2,31) as x
    union all
    select pow(2,31)
    union all
    select pow(2,31)
) as bignums;
+------------+
| sum(x)     |
+------------+
| 6442450944 | // returned as a "bigint"
+------------+
1 row in set (0.00 sec)

64bit:

mysql> select sum(x) from (
    select pow(2,63) as x
    union all
    select pow(2,63)
    union all
    select pow(2,63)
) as bignums;
+-----------------------+
| sum(x)                |
+-----------------------+
| 2.7670116110564327e19 | // returned as float
+-----------------------+
1 row in set (0.00 sec)

Double:

mysql> select sum(x) from (
    select 1.7e+308 as x
    union all
    select 1.7e+308
    union all
    select 1.7e+308
) as bignums;
+--------+
| sum(x) |
+--------+
|      0 |
+--------+

comment followup:

mysql> describe overflow
    -> ;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| x     | int(11)    | YES  |     | NULL    |       |
| y     | bigint(20) | YES  |     | NULL    |       |
| z     | double     | YES  |     | NULL    |       |
+-------+------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> select * from overflow;
+------------+---------------------+---------+
| x          | y                   | z       |
+------------+---------------------+---------+
| 2147483647 | 9223372036854775807 | 1.7e308 |
| 2147483647 | 9223372036854775807 | 1.7e308 |
| 2147483647 | 9223372036854775807 | 1.7e308 |
+------------+---------------------+---------+
3 rows in set (0.00 sec)

mysql> select sum(x), sum(y), sum(z) from overflow;
+------------+----------------------+--------+
| sum(x)     | sum(y)               | sum(z) |
+------------+----------------------+--------+
| 6442450941 | 27670116110564327421 |      0 |
+------------+----------------------+--------+
1 row in set (0.00 sec)
Marc B
  • 356,200
  • 43
  • 426
  • 500
  • 1
    What data type does `pow(2,31)` return? bigint or int ? – SQL Police Dec 18 '15 at 21:23
  • well, i get the exact same value output if I hardcode 2147483648 instead of pow(...). pow(2,63) does return a float. – Marc B Dec 18 '15 at 21:24
  • hmm. having trouble saving another edit, but hardcoding the 2^63 value returns 27670116110564327424 , which exceeds a 64bit int's max. very interesting/weird. – Marc B Dec 18 '15 at 21:29
  • Hm well, but then, `pow(2,31)` does not return a 4-byte `int`, and so the situation is different from what the OP asked. For a correct test, we should setup a test table with dataype `int`, and fill at least two records, such that there will happen an overflow when summing up. – SQL Police Dec 18 '15 at 21:44
  • 1
    See last edit above. set up a new table, and basically got the same results. Looks like it just fails "upwards" to a larger capacity type, until it hits a real ceiling and fails as "0". – Marc B Dec 18 '15 at 21:50
  • Note that Postgres will not silently return zero for the "overflow" example, it will correctly report an error. –  Dec 18 '15 at 21:57
  • Oracle will actually return "infinity" for the overflow example with the data type `binary_double`. –  Dec 18 '15 at 22:08
2

Postgres handles this without overflow or truncation:

From the manual:

sum(expression), Return Type: bigint for smallint or int arguments, numeric for bigint arguments, otherwise the same as the argument data type

http://www.postgresql.org/docs/current/static/functions-aggregate.html

And a quick test proves that:

psql (9.4.5)
Type "help" for help.

postgres=> create table x (amount int);
CREATE TABLE
postgres=>
postgres=> insert into x values (2147483647), (2147483647);
INSERT 0 2
postgres=> select sum(amount)
wbtest-> from x;
    sum
------------
 4294967294
(1 row)

postgres=>

Interesting enough the SQL standard requires the statement to fail in this situation:

If, during the computation of the result of AF, an intermediate result is not representable in the declared type of the site that contains that intermediate result, then
...
Otherwise, an exception condition is raised: data exception — numeric value out of range.

(AF = aggregate function)

1

When I understand you right, you are asking what happens in case of an overflow.

At least for SQL Server, look up this documentation:

https://msdn.microsoft.com/de-de/library/ms187810%28v=sql.120%29.aspx

Here it says what the return type of sum() is for specific input types:

Expression result               Return type
------------------------------------------------
tinyint                         int
smallint                        int
int                             int
bigint                          bigint
decimal category (p, s)         decimal(38, s)
money and smallmoney category   money
float and real category         float 

That means, there can indeed be an overflow. So I recommend you to use the type float or money for salaries, instead of the type int.

SQL Police
  • 4,127
  • 1
  • 25
  • 54
  • Thanks, at least some vendors supply info. https://mariadb.com/kb/en/mariadb/sum/ have no any statements (( I reached vote up limit, will do tomorrow. – gavenkoa Dec 18 '15 at 21:32