2

Let's say I have a program where I am inserting records into MYSQL table in ddatabase in Java.

Instead of inserting row by row, I insert by a batch of 1000 records. Using ExecuteBatch method, it doesn't seem to work as it still inserts row by row.

Code(only the snippet):

 public void readDataBase(String path,String word) throws Exception {
        try {


            Class.forName("com.mysql.jdbc.Driver");

            connect = DriverManager
                    .getConnection("jdbc:mysql://126.32.3.20/fulltext_ltat?"
                            + "user=root&password=root");
   String sql="insert IGNORE into  fulltext_ltat.indextable values (default,?, ?) ";

            preparedStatement = connect.prepareStatement(sql);

        for(int i=0;i<1000;i++) {
            preparedStatement.setString(1, path);
            preparedStatement.setString(2, word);
            preparedStatement.addBatch();



            if (i % 1000 == 0) {

                preparedStatement.executeBatch();
                System.out.print("Add Thousand");
            }

        }










        } catch (SQLException e) {
            e.printStackTrace();
        } finally {
            try {
                preparedStatement.close();
                connect.close();
            }
            catch (SQLException e) {
                e.printStackTrace();
            }

        }

    }

Code: Main method calling the above

public static void main(String[] args) throws Exception {

    StopWatch stopwatch = new StopWatch();
    stopwatch.start();


    File folder = new File("D:\\PDF1");
    File[] listOfFiles = folder.listFiles();

    for (File file : listOfFiles) {
        if (file.isFile()) {
            HashSet<String> uniqueWords = new HashSet<>();
            String path = "D:\\PDF1\\" + file.getName();
            try (PDDocument document = PDDocument.load(new File(path))) {

                if (!document.isEncrypted()) {

                    PDFTextStripper tStripper = new PDFTextStripper();
                    String pdfFileInText = tStripper.getText(document);
                    String lines[] = pdfFileInText.split("\\r?\\n");
                    for (String line : lines) {
                        String[] words = line.split(" ");

                        for (String word : words) {
                            uniqueWords.add(word)
                            ;

                        }

                    }
                    // System.out.println(uniqueWords);

                }
            } catch (IOException e) {
                System.err.println("Exception while trying to read pdf document - " + e);
            }
            Object[] words = uniqueWords.toArray();



            MysqlAccessIndex connection = new MysqlAccessIndex();

            for(int i = 1 ; i <= words.length - 1 ; i++ ) {

                connection.readDataBase(path, words[i].toString());

            }

            System.out.println("Completed");

        }
    }

The moment I run the program, the if statement is always executing rather than checking if there are 1000 records then only execute to insert to db.

Am I doing anything wrong?

Daredevil
  • 1,672
  • 3
  • 18
  • 47

2 Answers2

3

i % 1000 == 0 is true when i==0, so you only execute the batch in the first iteration of the loop.

You should execute the batch after the loop:

    for (int i=0;i<1000;i++) {
        preparedStatement.setString(1, path);
        preparedStatement.setString(2, word);
        preparedStatement.addBatch();
    }
    preparedStatement.executeBatch();
    System.out.print("Add Thousand");

Now, if you had 10000 records, and you wanted to execute batch insert every 1000, you could write:

    for (int i=0;i<10000;i++) {
        preparedStatement.setString(1, path);
        preparedStatement.setString(2, word);
        preparedStatement.addBatch();
        if ((i + 1) % 1000 == 0) {
            preparedStatement.executeBatch();
            System.out.print("Add Thousand");
        }
    }

EDIT: In order not to insert the same word multiple times to the table, pass an array to your method:

Change

        for(int i = 1 ; i <= words.length - 1 ; i++ ) {
            connection.readDataBase(path, words[i].toString());
        }

to

        connection.readDataBase(path, words);

and

public void readDataBase(String path,String word) throws Exception {

to

public void readDataBase(String path,String[] words) throws Exception {

and finally the batch insert loop would become:

    for (int i=0;i<words.length;i++) {
        preparedStatement.setString(1, path);
        preparedStatement.setString(2, words[i]);
        preparedStatement.addBatch();
        if ((i + 1) % 1000 == 0) {
            preparedStatement.executeBatch();
            System.out.print("Add Thousand");
        }
    }
    if (words.length % 1000 > 0) {
        preparedStatement.executeBatch();
        System.out.print("Add Remaining");
    }
Eran
  • 387,369
  • 54
  • 702
  • 768
  • What about the if statement to execute insert the batch after it hits 1000? – Daredevil Oct 31 '18 at 07:13
  • I think I might have made a big mistake. Because the code I posted is called from the main method through a for loop. So using your solution, it inserts the 1st record 1000 times and followed by the 2nd records ,...... etc . So I need to make sure the each word inserts only once – Daredevil Oct 31 '18 at 07:15
  • @Daredevil oh, I didn't notice that. If you want `readDataBase` to insert different records, perhaps you should pass to it to String arrays instead of two Strings, and only call it once from your `main` – Eran Oct 31 '18 at 07:18
  • I edited my post to include the main method code. Perhaps you can have a look and tell me what i did wrong – Daredevil Oct 31 '18 at 07:19
  • There's a problem in connection.readDataBase(path, words); as it does String[] cannot be applied to Object[] which is what I defined my words in. I could convert my hashset to array but it couldn't be stored in array but only in objects[] and I don't know why – Daredevil Oct 31 '18 at 07:27
  • @Daredevil I was confused by the other `String[] words` variable you have in your `main`. You can easily change `Object[] words = uniqueWords.toArray();` to `String[] words = uniqueWords.toArray(new String[uniqueWords.size()]);` – Eran Oct 31 '18 at 07:29
  • I see, so you are using the same String[] words as the one for String[] words =line.split.... am i right? – Daredevil Oct 31 '18 at 07:31
  • @Daredevil No. I suggested (in my last comment) you change `Object[] words` to a `String[] words`. – Eran Oct 31 '18 at 07:33
  • Okay I ran your code, it works fine. Just a question regarding the loops, let's say if I have 20,000 words, does it insert batch of 1000 each? – Daredevil Oct 31 '18 at 07:34
  • @Daredevil yes. – Eran Oct 31 '18 at 07:35
  • Can I verify, once I start inserting, should it insert the records to the table straight or I could only see records in the table after 1000 records have been added to "batch" ? Because before the records hits 1000, i could already see records in the table. – Daredevil Oct 31 '18 at 07:36
  • @Daredevil it's possible your DB is not configured to support batch insert. Please check [this question](https://stackoverflow.com/questions/2993251/jdbc-batch-insert-performance) to find out how the DB parameters should be configured. – Eran Oct 31 '18 at 07:39
  • Ok thanks. One thing I noticed is there is an extra record of null for the word in my first row, how do i remove it? Can you explain why you need two for loops in your answer? – Daredevil Oct 31 '18 at 07:41
  • @Daredevil there's only one for loop. And as for the null first word, you can add a check that a `word` is not null or empty before calling `uniqueWords.add(word)` – Eran Oct 31 '18 at 07:43
  • My bad, I meant to say if statement, why do you need the second one to add remaining batch, that's the part I am confused. Like, why use (i+1) but not just i? – Daredevil Oct 31 '18 at 07:45
  • @Daredevil what if the number of words is 7005? The loop will insert 7 batches of 1000 words, but it won't insert the last 5. – Eran Oct 31 '18 at 07:46
  • @Daredevil and i+1 instead of i because you want to insert the first batch of 1000 words (having indices 0 to 999) when i==999, so (i+1) % 1000 == 0. – Eran Oct 31 '18 at 07:49
  • A simple modification I think would work: change int i=1 in the for loop and in the if statement, change to i%1000==0 , that would be the same right? – Daredevil Oct 31 '18 at 07:52
  • @Daredevil that would skip the first word. – Eran Oct 31 '18 at 07:54
  • Then it's strange because I tested this way and the number of record in my table checks out correctly. – Daredevil Oct 31 '18 at 07:56
0

In the configuration property url add: allowMultiQueries=true

Simon
  • 29
  • 7