I've been noticing some problems with simple aggregate performance in Postgres (8.3). The issue is that if I have a table (say 200M rows) that is unique by (customer_id,order_id), then the query select customer_id,max(order_id) from larger_table group by customer_id
is more than an order of magnitude slower than a simple Java/JDBC program that does the following:
1) Initialize an empty HashMap customerMap (this will map id -> max order size) 2) executes "select customer_id,order_id from larger_table", and gets a streaming result set 3) iterates over the result set, at every row doing something like the following:
long id = resultSet.getLong("customer_id");
long order = resultSet.getLong("order_id");
if (!customerMap.containsKey(id))
customerMap.put(id,order);
else
customerMap.put(id,Math.max(order,customerMap.get(id)));
Is this performance difference expected? I should think not, since I imagine the above is pretty close to what is happening internally. Is it evidence that there is something wrong/incorrectly tuned with the db?