Spring Batch: Azure SQL Performance is inconsistent

Question

I have a spring batch application that uses Azure SQL server as a backend, I am using Hibernate to update the database.

I am reading the data from CSV file using FlatfileReader & writing into Azure SQL Server using ItemWriter as mentioned below

Below is my Hibernate configuration

<beans
    xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:batch="http://www.springframework.org/schema/batch"
    xmlns:context="http://www.springframework.org/schema/context"
    xmlns:tx="http://www.springframework.org/schema/tx"
    xmlns:p="http://www.springframework.org/schema/p"
    xsi:schemaLocation="http://www.springframework.org/schema/batch
    http://www.springframework.org/schema/batch/spring-batch-2.2.xsd
    http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans-3.2.xsd
    http://www.springframework.org/schema/tx
    http://www.springframework.org/schema/tx/spring-tx.xsd
    http://www.springframework.org/schema/context
    http://www.springframework.org/schema/context/spring-context-3.0.xsd
    ">
    
    <context:annotation-config/>
    <context:component-scan base-package="com.demo.entity" />

    <bean id="itemWriter" class="com.demo.batch.jobs.csv.Writer" >
        <constructor-arg ref = "hibernateItemWriter"/>
    </bean>

    <bean id="hibernateItemWriter" class="org.springframework.batch.item.database.HibernateItemWriter">
        <property name="sessionFactory" ref="sessionFactory"/>
    </bean>

    <bean id="sessionFactory" class="org.springframework.orm.hibernate5.LocalSessionFactoryBean" >
        <property name="dataSource" ref="irusDataSource"/>
        <property name="hibernateProperties" ref="hibernateProperties"/>
        <property name="packagesToScan">
            <list>
                <value>com.demo.*</value>
            </list>
        </property>
    </bean>

    <!-- DATA SOURCE -->
    <bean id="irusDataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
        <property name="driverClassName" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" />
        <property name="url" value="jdbc:sqlserver://sqlserver85.database.windows.net:1433;database=sqldb;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;" />
        <property name="username" value="ddddd" />
        <property name="password" value="`JNp" />
    </bean>

    <bean id="transactionManager" class="org.springframework.orm.hibernate5.HibernateTransactionManager" lazy-init="true">
        <property name="sessionFactory" ref="sessionFactory" />
    </bean>

    <tx:annotation-driven transaction-manager="transactionManager"/>
    <bean id="hibernateProperties" class="org.springframework.beans.factory.config.PropertiesFactoryBean">
        <property name="properties">
            <props>
                <prop key="hibernate.dialect">org.hibernate.dialect.SQLServer2012Dialect</prop>
            </props>
        </property>
    </bean>
</beans>
<bean id="cvsFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="classpath:cvs/input/students.csv" />
    <property name="lineMapper">
        <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
            <property name="lineTokenizer">
                <bean
                    class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                    <property name="names" value="student" />
                </bean>
            </property>
            <property name="fieldSetMapper">
                <bean class="com.demo.batch.mapper.StudentMapper" />
            </property>
        </bean>
    </property>
</bean>
<batch:step id="starterJob">
    <batch:tasklet>
        <batch:chunk
                reader="cvsFileItemReader"
                processor="itemProcessor"
                writer="itemWriter"
                commit-interval="100">
        </batch:chunk>
    </batch:tasklet>
</batch:step>

Below is the ItemProcessor

import com.demo.entity.Student;
import org.springframework.batch.item.ItemProcessor;

public class Processor implements ItemProcessor<Student, Student> {

    @Override
    public Student process(Student item) throws Exception {
        
        System.out.println("Processing..." + item);
        //Thread.sleep(50);
        return item;
    }

}

Below is the ItemWriter

public class Writer implements ItemWriter<Student> {
    private HibernateItemWriter<Student> hibernateItemWriter;

    public Writer(HibernateItemWriter<Student> hibernateItemWriter) {
        this.hibernateItemWriter = hibernateItemWriter;

        System.out.println("Hibernate Writer instance is created..: " + hibernateItemWriter.hashCode());
    }

    @Override
    //@Transactional(propagation = Propagation.REQUIRES_NEW)
    public void write(List<? extends Student> list) throws Exception {
        hibernateItemWriter.write(list);
    }
}

The CSV file has 6k records, at times it takes 4mins to complete the job

other times, it takes 15mins

How could the simple code executed on the same VM will have different performance results? Is this something due to Azure SQL Performance being inconsistent? How to ensure that it will always have 4mins execution time?

Stats for 4mins execution

19100 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
15127700 nanoseconds spent preparing 200 JDBC statements;
4103544900 nanoseconds spent executing 200 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
2099567900 nanoseconds spent executing 1 flushes (flushing a total of 100 entities and 0 collections);

Stats for 15mins execution

19400 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
15668300 nanoseconds spent preparing 200 JDBC statements;
13671456600 nanoseconds spent executing 200 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
6881730800 nanoseconds spent executing 1 flushes (flushing a total of 100 entities and 0 collections);

Shailendra · Accepted Answer · 2021-11-08T16:51:20.653

1

Simple inserts of 6K records taking this much time does not sound ok. You can try enabling hibernate statistics (see here) and you might get idea of how much time hibernate is spending in other internal tasks and executing SQLs. You will see something like below

2021-11-08 21:58:18 - Session Metrics {
    4918700 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    20 nanoseconds spent preparing 1 JDBC statements;
    300 nanoseconds spent executing 1 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

Also it is worth checking the network latency between VM running the code and the SQL Server ( for e.g., if they are in different regions you might be incurring big penalty in interacting with database in network round-trip especially if your SQLs are not being batched)

edited Nov 08 '21 at 16:51

answered Nov 08 '21 at 16:29

Shailendra

8,874
2
28
37

Updated the question with stats, query execution and flushing takes more time. However I don't know why. can you suggest? – One Developer Nov 09 '21 at 07:04
what is the difference between query execution and flushing? – One Developer Nov 09 '21 at 07:13
So in your case the major time is being spent in VM itself consuming CPU and memory. Query execution is the time for running the query - flushing is how Hibernate checks the objects and it's relations for dirtiness ( for what has changed). Plus looks like you are not using SQL Batching. – Shailendra Nov 09 '21 at 07:22
"13671456600 nanoseconds spent executing 200 JDBC statements" - can happen only if your SQL's are not batched and SQL are being executed one by one. – Shailendra Nov 09 '21 at 07:25
Check https://learn.microsoft.com/en-us/azure/azure-sql/performance-improve-use-batching for impact and best practices related to batching for Azure SQL. – Shailendra Nov 09 '21 at 07:38
Thank you, is there a way for me to convert the code to use batch processing? I am using Hibernatewriter.Write method – One Developer Nov 09 '21 at 07:54

Spring Batch: Azure SQL Performance is inconsistent

1 Answers1