2

I have Spring Boot app running on Wildfly 18.0.1. The main purpose of the app is: every 5 minutes run some job. So I make:

TaskScheduler: initialize scheduler

@Autowired
ThreadPoolTaskScheduler taskScheduler;
taskScheduler.scheduleWithFixedDelay(new ScheduledVehicleDataUpdate(), 300000);

ScheduledVehicleDataUpdate: scheduler that runs updater

public class ScheduledVehicleDataUpdate implements Runnable {
    @Autowired
    TaskExecutor taskExecutor;

    @Override
    public void run() {
        try {
            CountDownLatch countDownLatch;
            List<VehicleEntity> vehicleList = VehicleService.getInstance().getList();
            if (vehicleList.size() > 0) {
                countDownLatch = new CountDownLatch(vehiclesList.size());
                vehicleList.forEach(vehicle -> taskExecutor.execute(new VehicleDataUpdater(vehicle, countDownLatch)));
                countDownLatch.await();
            }
        }
        catch (InterruptedException | RuntimeException e) {
            System.out.println(e.getMessage())
        }
    }
}

TaskExecutor:

@Bean
public TaskExecutor taskExecutor() {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(23);
    executor.setMaxPoolSize(23);
    executor.setQueueCapacity(5000);
    executor.setThreadNamePrefix("VehicleService_updater_thread");
    executor.initialize();
    return executor;
}

VehicleDataUpdater: main updater class

public class VehicleDataUpdater implements Runnable {
    private final VehicleEntity vehicle;
    private final CountDownLatch countDownLatch;

    public VehicleDataUpdater(VehicleEntity vehicle, CountDownLatch countDownLatch) {
        this.vehicle = vehicle;
        this.countDownLatch = countDownLatch;
    }

    @Override
    public void run() {    
        try {
            this.updateVehicleData();
        }
        catch (Exception e) {
            System.out.println(e.getMessage());
        }
        finally {
            countDownLatch.countDown();
        }
    }

    public void updateVehicleData() {
        // DO UPDATE ACTIONS;
    }
}

The issue is that after finish ScheduledVehicleDataUpdate the memory is NOT clearing. It looks like this: enter image description here

Every step the memory is growing, growing, growing and at unpredictable moment all memory is freed up. And objects from first iteration, and object from last iteration. In the most bad case it takes all available memory (120Gb) and Wildfly crashes.

I have about 3200 VehicleEntity records (lets assume exactly 3200). So I've looked for the VehicleDataUpdater - how much objects there are in memory. After first iteration (when I only started app) it is less than 3200 but not zero - maybe about 3000-3100. And every step it grows but not exactly on 3200 records. That means that some of objects clears from memory but most of them remains there.

Next: normal duration of iteration is about 30sec - 1min. When memory is not clear up and continue growing then each iteration get more and more time: the longest that I saw was 30 minutes. And threads from pool are mostly in "monitor" state, i.e. there are some locks waiting to release. Possibly locks from previous iterations that was not released - and question again - why all memory was not freed up at previous step?

If I execute update in one thread (without taskExecutor, simply vehicleList.foreach(vehicle -> VehicleDataUpdater(vehicle)); ) than I didn't see any memory growing. After update each vehicle memory is cleared.

I didn't find any issues with memory leaks for ThreadPoolTaskExecutor or ThreadPoolTaskScheduler, so I have no idea how to fix it.

What possible ways of not clearing memory after finish scheduler task? How can I look who is locking object after finish? I'm using VisualVM 2.0.1 and didn't find there such possibilities.

EDIT 1:

VehicleService:

public class VehicleService {
    private static VehicleService instance = null;
    private VehicleDao dao;

    public static VehicleService getInstance(){
        if (instance == null) {
            instance = new VehicleService();
        }
        return instance;
    }

    private VehicleService(){}

    public void setDao(VehicleDao vehicleDao) { this.dao = vehicleDao; }

    public List<VehicleEntity> list() {
        return new ArrayList<>(this.dao.list(LocalDateTime.now()));
    }
}

VehicleDao:

@Repository
public class VehicleDao {
    @PersistenceContext(unitName = "entityManager")
    private EntityManager entityManager;

    @Transactional("transactionManager")
    public List<VehicleRegisterEntity> list(LocalDateTime dtPeriod) {
        return this.entityManager.createQuery("SOME_QUERY", VehicleEntity.class).getResultList();
    }
}

InitService:

@Service
public class InitHibernateService {
    private final VehicleDao vehicleDao;

    @Autowired
    public InitHibernateService(VehicleDao vehicleDao){
        this.vehicleDao = vehicleDao;
    }

    @PostConstruct
    private void setDao() {
        VehicleService.getInstance().setDao(this.vehicleDao);
    }
}

EntityManager:

@Bean(name = "entityManager")
@DependsOn("dataSource")
public LocalContainerEntityManagerFactoryBean entityManagerFactory() throws NamingException {
    LocalContainerEntityManagerFactoryBean em = new LocalContainerEntityManagerFactoryBean();
    em.setPersistenceProviderClass(HibernatePersistenceProvider.class);
    em.setDataSource(dataSource());
    em.setPackagesToScan("MY_PACKAGE");
    em.setJpaVendorAdapter(vendorAdapter());
    em.setJpaProperties(hibernateProperties());
    em.setPersistenceUnitName("customEntityManager");
    em.setJpaDialect(new CustomHibernateJpaDialect());
    return em;
}
zhoriq
  • 85
  • 1
  • 3
  • 11
  • 1
    VehicleService.getInstance().getList() .. i think you should inject VehicleService with @autowire – Gewure Apr 23 '20 at 01:41
  • What is `VehicleService.getInstance().getList()` actually doing? Also you probably should be updating/reading things in chunks/lazy instead of lists. Assuming you are using something like JPA you might have another issue with detached entities. All in all there isn't enough information in your question to answer it. – M. Deinum Apr 23 '20 at 06:42
  • @Gewure it's little bit more complex... added code with service, dao, etc. – zhoriq Apr 23 '20 at 09:13
  • @M.Deinum how can I do it with chunks? Also I didn't see problem with list because it's not so huge. 3000 records doesn't seems too much for me. Yes, I'm using JPA and I updated question with it's code. I was thinking about EntityManager memory issue so I'm specially separate entities from EM to the main code with _return new ArrayList<>(...)_ in VehicleService. – zhoriq Apr 23 '20 at 09:18
  • Creating a new list won't help, it are still managed entities. It will load all 3000 at once in memory (and now it are 3000, what about 30000) and also consider that no-process runs alone. Your `ScheduledVehicleDataUpdate` should be a spring managed bean with an `@Scheduled` so that Spring injects the dependencies and use `@Scheduled` to schedule things. You should be working WITH the framework you are currently working around it. Regarding the chunks, that might not totally work with your current solution. Also do you need all those small tasks? Why not just sequentially do the updates? – M. Deinum Apr 23 '20 at 09:26
  • 1
    It all looks a lot like trying to optimize something that doesn't need optimizing and only makes things overly complex. – M. Deinum Apr 23 '20 at 09:27
  • @M.Deinum yes, creating new list doesn't have sense, I agree. I've used complex scheduler because I'm loading delay value from properties file that can be changed on-the-fly. Possibly my scheduler is not optimal but it can't affect memory leaks. Or can? Sequential update will take longer time (now it's 23 thread = 0.5min, with 1 thread = approx 11.5min) – zhoriq Apr 23 '20 at 11:08
  • i am missing a @ service above your VehicleService. I see you did a Singleton pattern there, but that does really not make a difference: even Singleton services are injected using @ Autowire and @ Service --- Spring makes sure that actually only ONE instance of it runs. see: https://stackoverflow.com/questions/2173006/should-service-layer-classes-be-singletons so i think its a bit a mixture of instantiating manually, spring lifecycle and multithreading, leading to leaks - hard to tell what exactly leaks, but as long as you don't use Constructor-Injection, @ Service and @ autowire ..! – Gewure Apr 23 '20 at 13:40
  • When doing 1 thread, do chunks processing, (ie flush after x records and clear the cache). Read streaming instead of a full list. This might still be a little slower but easier to maintain. The main issue with a single thread is the dirty checking of JPA which becomes a bottleneck (hence the flush and clear after x records). Also you cannot change anything here after startup, you can use the same with `@Scheduled` and read from properties file, so there is nothing preventing you from doing proper design. – M. Deinum Apr 23 '20 at 14:01

1 Answers1

2

Looking at what you are trying to achieve is basically optimal batch processing when using JPA. However you are trying to use a canon (multi-threading) instead of solving the actual issue. For a nice overview I strongly suggest a read of [this blog post][1].

  1. Use chunk processing and flush the entity manager after x records and then clear. This prevents you from doing a lot of dirty checks in the first level cache
  2. Enable Batch statements on hibernate as well as ordering inserts and updates

First of all start with the properties make sure that your hibernateProperties contains the following

hibernate.jdbc.batch_size=25
hibernate.order_inserts=true
hibernate.order_updates=true

Then rewrite your ScheduledVehicleDataUpdate to take advantage of this and periodically flush/clear the entitymanager.

@Component
public class ScheduledVehicleDataUpdate {
    @PersistenceContext
    private EntityManager em;

    @Scheduled(fixedDelayString="${your-delay-property-here}")
    @Transactional
    public void run() {
        try {
            List<VehicleEntity> vehicleList = getList();
            for (int i = 0 ; i < vehicleList.size() ; i++) {
              updateVehicle(vehicleList.get(i));
              if ( (i % 25) == 0) {
                em.flush();
                em.clear();
              }
            }
        }
    }

    private void updateVehicle(Vehicle vehicle) {
       // Your updates here
    }

    private List<VehicleEntity> getList() {
        return this.entityManager.createQuery("SOME_QUERY", VehicleEntity.class).getResultList();
    }
}

Now you could also decrease the memory consumption of the getList by making that a bit more lazy (i.e. only retrieve the data when you need it). You can do this by tapping into hibernate and use a stream method (as of Hibernate 5.2) or when using older versions do a bit more work and use a ScrollableResult (see Is there are way to scroll results with JPA/hibernate? ). If you are already on JPA 2.2 (i.e. Hibernate 5.3) you can use getResultStream directly.

private Stream<VehicleEntity> getList() {
  Query q = this.entityManager.createQuery("SOME_QUERY", VehicleEntity.class);
  org.hibernate.query.Query hq = q.unwrap(org.hibernate.query.Query.class);
  return hq.stream();
}

or with JPA 2.2

private Stream<VehicleEntity> getList() {
  Query q = this.entityManager.createQuery("SOME_QUERY", VehicleEntity.class);
  return q.getResultStream();
}

In your code you would need to change the for loop to work with a stream, and keep a counter yourself and still flush periodically. Using a stream will unlikely improve performance (might even degrade it) but will use less memory then when retrieving all elements at once. As you only have as many objects in-memory as you are using for the batch-size!.

@Scheduled(fixedDelayString="${your-delay-property-here}")
    @Transactional
    public void run() {
        try {
            Stream<VehicleEntity> vehicles = getList();
            LongAdder counter = new LongAdder();
            vehicles.forEach(it -> {
              counter.increment();
              updateVehicle(it);
              if ( (counter.longValue() % 25) == 0) {
                em.flush();
                em.clear();
              }
            });
            }
        }
    }

Something like this should do the trick.

NOTE: I typed the code as I went along, this might not compile due to some missing brackets, imports etc.

M. Deinum
  • 115,695
  • 22
  • 220
  • 224