2

There are more than 10,00,000 records in the table, I am working on. I need to perform an asynchronous operation(a push queue) for each record. Getting all the records at once and processing each record in a loop feels like a bad idea. Instead, I want to fetch records in batches and loop over each batch. Read somewhere on the internet about querying in batches using setFetchSize(int n) and my DAO looks like:

public List<UserPreferenceDTO> getUserPreferences() {
    String sqlQueryString = "select us.id as userId, pf.id as preferenceId from users us, preferences pf where us.id = pf.user_id;";
    SQLQuery sqlQuery = (SQLQuery) session.createSQLQuery(sqlQueryString).setFetchSize(200);
    return sqlQuery.addScalar("userId").addScalar("preferenceId").setResultTransformer(new AliasToBeanResultTransformer(UserPreferenceDTO.class)).list();
    }

My Service class looks like:

List<UserPreferenceDTO> userPreferenceDTOs = userDeviceDao.getUserPreferences();

    for(UserPreferenceDTO userPreferenceDTO: userPreferenceDTOs ){
         pushToRabbitMQ(userPreferenceDTO);
    }

I need to get "N" records from the DB push them to the queue for processing then get another "N" records push them to queue and so on till all the records are pushed to queue.

user2160919
  • 173
  • 9

1 Answers1

0

A reasonable setFetchSize() is a must in any batch load scenario as the database won't have to send each row separately. Even if your roundtrip to the database is just 10ms it's still 10ms * 10mln ~ 28 h to do it for all the rows. The improvement usually plateaus somewhere around 1000 but this depends on your environment setup so you need to test it.

It might be enough to replace .list() with .scroll() which returns ScrollableResults which allows to read one record at a time. This will however depend on the database, some like MySQL will fake the scrolling and load the entire result set.

If that's the case you need to use ORDER BY in your query with setFirstResult() and setMaxResult(). This will execute new query to read each batch. It's the safest approach but ORDER BY might be an expensive statement.

Karol Dowbecki
  • 43,645
  • 9
  • 78
  • 111
  • PostgreSQL JDBC does support `fetchSize`, however there are limitations listed in the documentation: https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor – Vladimir Sitnikov Apr 10 '18 at 09:42