0

I am using Apache Solr 6.4.1. Because I am using a really big database (over 3mio rows), I would like to add batchSize="-1" in the db-data-config.xml.

But if I do this, it did work. Without batchSize I can get the first 2k rows than I get a "java.lang.RuntimeException: java.lang.StackOverflowError" Error.

In Solrconfig.xml

<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
  <str name="config">db-data-config.xml</str>
</lst>

In db-data-config.xml

<dataConfig>
  <dataSource type="JdbcDataSource"
          driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
          url="jdbc:sqlserver://***:1433;integratedSecurity=true;
          Initial Catalog=***;"
          batchSize="-1"/>
...

Why is batchSize="-1" dont working? (batchSize="200" or other is working)

UPDATE if I set Debug in Dataimporthandler to false, then it works!

Hamso
  • 13
  • 7

1 Answers1

1

I don't think that set batchSize to '-1' would help in your situation. This is written inside the source code of Solr DataImportHandler:

if (batchSize == -1)
  batchSize = Integer.MIN_VALUE;

  [... omissis ...]

Statement statement = c.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
statement.setFetchSize(batchSize);

So double check what kind of parameters accepts MS JDBC driver for the setFetchSize method.

setFetchSize - Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects generated by this Statement. If the value specified is zero, then the hint is ignored. The default value is zero.

So the driver is free to ignore this hint, may be it is just reading in the whole table. And you could also try to change the version of your JDBC driver...

I think you first should adapt the value depending to network latency and the amount of record you want to retrive at each round trip.

Indexing performance and mssql server load depends on the batchsize. Try starting with a small size and then gradually increase it.

If this not works try to radically change your JDBC driver.

Returning to batchSize parameter, there are only few cases where you don't need it. Generally this is the behaviour the method should have:

  • if you have configured your JVM with enough memory to read the entire table
  • if your JDBC driver would rise an exception invoking setFetchSize() method
  • if you're dealing with MySql JDBC driver which has a known bug
Community
  • 1
  • 1
freedev
  • 25,946
  • 8
  • 108
  • 125