0

For an application based on Spring Boot and relying on a PostgreSQL 9.6 database, I'm using Spring Batch to schedule a few operations which must take place every n seconds (customizable but usually ranging between a few seconds and a few minutes); as a result, at the end of the day a lot of jobs are performed by the system and a lot of information are persisted by Spring Batch.

The fact is that I'm not really interested in historicizing those jobs so, at the beginning, I used the in-memory version of Spring Batch to avoid any kind of persistency on such (to me) useless information.

However, in case of configurations with small n running on environments with low resources, this approach led to performance issues so I decided to try with the database way.

Unfortunately, those tables grow quite fast and I would like to implement a cleanup procedure to get rid of all data older than, for instance, a day.

Here comes the pain: in fact, even if nothing is locking those tables (so, the main application is down and noone is interacting with the database) it takes forever to clean them and I really cannot understand the reason why.

Spring Batch (4.0.1) provides the following PG script to generate those tables:

CREATE TABLE BATCH_JOB_INSTANCE  (
    JOB_INSTANCE_ID BIGINT  NOT NULL PRIMARY KEY ,
    VERSION BIGINT ,
    JOB_NAME VARCHAR(100) NOT NULL,
    JOB_KEY VARCHAR(32) NOT NULL,
    constraint JOB_INST_UN unique (JOB_NAME, JOB_KEY)
) ;

CREATE TABLE BATCH_JOB_EXECUTION  (
    JOB_EXECUTION_ID BIGINT  NOT NULL PRIMARY KEY ,
    VERSION BIGINT  ,
    JOB_INSTANCE_ID BIGINT NOT NULL,
    CREATE_TIME TIMESTAMP NOT NULL,
    START_TIME TIMESTAMP DEFAULT NULL ,
    END_TIME TIMESTAMP DEFAULT NULL ,
    STATUS VARCHAR(10) ,
    EXIT_CODE VARCHAR(2500) ,
    EXIT_MESSAGE VARCHAR(2500) ,
    LAST_UPDATED TIMESTAMP,
    JOB_CONFIGURATION_LOCATION VARCHAR(2500) NULL,
    constraint JOB_INST_EXEC_FK foreign key (JOB_INSTANCE_ID)
    references BATCH_JOB_INSTANCE(JOB_INSTANCE_ID)
) ;

CREATE TABLE BATCH_JOB_EXECUTION_PARAMS  (
    JOB_EXECUTION_ID BIGINT NOT NULL ,
    TYPE_CD VARCHAR(6) NOT NULL ,
    KEY_NAME VARCHAR(100) NOT NULL ,
    STRING_VAL VARCHAR(250) ,
    DATE_VAL TIMESTAMP DEFAULT NULL ,
    LONG_VAL BIGINT ,
    DOUBLE_VAL DOUBLE PRECISION ,
    IDENTIFYING CHAR(1) NOT NULL ,
    constraint JOB_EXEC_PARAMS_FK foreign key (JOB_EXECUTION_ID)
    references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;

CREATE TABLE BATCH_STEP_EXECUTION  (
    STEP_EXECUTION_ID BIGINT  NOT NULL PRIMARY KEY ,
    VERSION BIGINT NOT NULL,
    STEP_NAME VARCHAR(100) NOT NULL,
    JOB_EXECUTION_ID BIGINT NOT NULL,
    START_TIME TIMESTAMP NOT NULL ,
    END_TIME TIMESTAMP DEFAULT NULL ,
    STATUS VARCHAR(10) ,
    COMMIT_COUNT BIGINT ,
    READ_COUNT BIGINT ,
    FILTER_COUNT BIGINT ,
    WRITE_COUNT BIGINT ,
    READ_SKIP_COUNT BIGINT ,
    WRITE_SKIP_COUNT BIGINT ,
    PROCESS_SKIP_COUNT BIGINT ,
    ROLLBACK_COUNT BIGINT ,
    EXIT_CODE VARCHAR(2500) ,
    EXIT_MESSAGE VARCHAR(2500) ,
    LAST_UPDATED TIMESTAMP,
    constraint JOB_EXEC_STEP_FK foreign key (JOB_EXECUTION_ID)
    references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;

CREATE TABLE BATCH_STEP_EXECUTION_CONTEXT  (
    STEP_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
    SHORT_CONTEXT VARCHAR(2500) NOT NULL,
    SERIALIZED_CONTEXT TEXT ,
    constraint STEP_EXEC_CTX_FK foreign key (STEP_EXECUTION_ID)
    references BATCH_STEP_EXECUTION(STEP_EXECUTION_ID)
) ;

CREATE TABLE BATCH_JOB_EXECUTION_CONTEXT  (
    JOB_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
    SHORT_CONTEXT VARCHAR(2500) NOT NULL,
    SERIALIZED_CONTEXT TEXT ,
    constraint JOB_EXEC_CTX_FK foreign key (JOB_EXECUTION_ID)
    references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;

CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE BATCH_JOB_SEQ MAXVALUE 9223372036854775807 NO CYCLE;

By respecting the references precedence, I try to cleanup those tables by executing the following deletions:

delete from BATCH_STEP_EXECUTION_CONTEXT;
delete from BATCH_STEP_EXECUTION;
delete from BATCH_JOB_EXECUTION_CONTEXT;
delete from BATCH_JOB_EXECUTION_PARAMS;
delete from BATCH_JOB_EXECUTION;
delete from BATCH_JOB_INSTANCE;

Everything is ok with the first 4 tables but, as soon as I reach the BATCH_JOB_EXECUTION one, it keeps like 30 minutes to remove a few hundred thousands of rows. Even worse, after deleting everything from the first 5 tables, the last one (which is now linked to nothing) takes even more.

Can you see a reason why this simple operation takes so long to complete? I mean, of course it has to check for constraints violations but it seems however unreasonably slow.

Plus, is there a better way to use Spring Batch without wasting disk space with unnecessary jobs information?

AV FMX
  • 45
  • 1
  • 1
  • 7
  • You could try to cleanup all tables at once with a **single** truncate command. But most probably the long time is caused by _other_ activity on the tables and the DELETE commands are waiting to acquire the locks. –  Mar 03 '20 at 07:34
  • I don't think it can be related to db locks because, as mentioned, I'm encountering same timings even if the application is stopped and thus nothing is using the given database but the pgAdmin instance used to connect to it. – AV FMX Mar 03 '20 at 08:43
  • "*but the pgAdmin instance used to connect to it*" - well maybe _that_ has a lock on one of the tables. While the DELETE is hanging you could check for locks in [`pg_stat_activity`](http://www.postgresql.org/docs/current/static/monitoring-stats.html) –  Mar 03 '20 at 08:46
  • The order of delete statements in your script is correct, see https://github.com/spring-projects/spring-batch/blob/d8fc58338d3b059b67b5f777adc132d2564d7402/spring-batch-test/src/main/java/org/springframework/batch/test/JobRepositoryTestUtils.java#L211. That said, it should not take forever. Try to shutdown pgAdmin and execute your cleanup script from the command line. – Mahmoud Ben Hassine Mar 03 '20 at 09:17
  • try to use `truncate`. Here us a useful discussion about "truncate vs delete": https://stackoverflow.com/questions/11419536/postgresql-truncation-speed/11423886#11423886. If you have a big table, truncate must be better for you. – Dmitrii Bocharov Mar 03 '20 at 10:15

0 Answers0