Retrieving distinct rows where columns have the same value and iterating by not having a cursor timeout?

Question

I have seen this, this and this, but none of the answers solve my problem. Even group by isn't working because of a sql_mode=only_full_group_by error.

I have a table with rows:
id, cId, aId, eId, val
1 231 6 56 4
2 231 6 56 7
3 451 6 30 4
4 762 7 56 4
5 342 7 32 5
6 453 7 34 6
7 342 7 32 8

I want distinct rows based on cId,aId and eId only. So the result will be:
231 6 56
451 6 30
762 7 56
342 7 32
453 7 34

The two challenges:
1. What query could give me this result?
2. The DB contains around 16 million rows. Once I retrieve the unique rows, I'll be iterating it via ResultSet of Java. I've encountered a problem with MongoDB, that when iterating a very long result set, the cursor's connection to the database times out even though I'm actively iterating. I've been told that the same can happen with MySQL if it is a very large result set.

So my question here is not just about retrieving those distinct rows, but also about getting it in such a way that I'd be able to iterate on chunks of maybe 10000 of them at a time. I know of the limit command, but I don't know how to continue retrieving from the 10001'th result after running a select with limit 10000.

score 1 · Accepted Answer · edited May 18 '16 at 09:56

How about

select distinct cId,aId, eId from TheTable;

also see about sql pagination (if necessary).

You'd get the results in pages with repeated commands in Java, like this:

select distinct cId, aId, eId from TheTable limit 1, 10000;
select distinct cId, aId, eId from TheTable limit 10001, 10000;
select distinct cId, aId, eId from TheTable limit 20001, 10000;
select distinct cId, aId, eId from TheTable limit 30001, 10000;

and so on...
You'd need a variable to change the 1 to 10001 and then to 20001 etc. What those statements mean, is select something from TheTable and limit it to the results of the first row of the results to the 10000'th row of the results. The second select does a select from the 10001'th row of the distinct results to the 10001+10000'th row.

But before starting with the pagination, it'd help to know the total number of results you are going to iterate through. You can do that with:

select count(distinct cId,aId, eId) from TheTable;

Retrieving distinct rows where columns have the same value and iterating by not having a cursor timeout?

1 Answers1