1

I create a program that inserts to MySql database millions of values. I read about batch insert that will optimize my program and make it faster but when I tried to do it, it worked in the same way. Instead of inserting each value to the database I kept in a list each time 500 values and then insert them in one big loop like this:

for(int i=0;i<500;i++)
{
   insertData(list.get(i));
}

Then i remove all the values in the list and start collecting 500 values again. Shouldn't it work better?
My Insert code is:

public void insertToNameTable(String id,String name) throws SQLException
       {
           PreparedStatement ps=null;

            ps= conn.prepareStatement("INSERT INTO NameTable values(?,?,?)",user.getId(),user.getName());


            ps.setString(1,id);
            ps.setString(2,name);
            ps.setBoolean(3,false);
            ps.executeUpdate();

       }

I have some questions:
1.why it isn't work faster when i do batch insert?
2.how many values I should enter each time in order to make it faster?(500,1000,10000) the more values enter together is better?
3. is the way I insert the values to my database is the best way?

user3100708
  • 148
  • 3
  • 19
  • sql optimization is smarter than you. You trivial example will not fool the optimzer. – Scary Wombat Dec 18 '14 at 08:43
  • @ScaryWombat So how should i insert values to the database in the best way? – user3100708 Dec 18 '14 at 08:45
  • Post your batch insert program. Also my insertData(list.get(i)) are you saying that list.get(i) returnsa list of 500 values? Then how do you insert those 500 values. Are you using the something like: "insert into table (x,y,z) values (1,2,3), (3,4,5),...." – dan b Dec 18 '14 at 09:02
  • @danb list.get(i) return to me one value. the list size is 500. In my program instead of doing insert after insert i collect each time 500 values in the list and then insert them one after one. i understood that it isn't the right way but how should i do it. how many values should i insert each time to optimize it? – user3100708 Dec 18 '14 at 09:04
  • But you do the same thing? you create 500 different insert statement. that's not a bulk insert – MihaiC Dec 18 '14 at 09:07
  • @MihaiC yes. i knew it isn't the right way, so for that i asked this question. – user3100708 Dec 18 '14 at 09:09
  • What is the point of doing what you are doing. You are still inserting one at a time. More interesting might be to do what I suggested in the comment. But first post you batch code. Also take a look at http://stackoverflow.com/a/15889745/4350148 – dan b Dec 18 '14 at 09:10
  • @danb i edit the question take a look. i want to insert 500 id's and 500 names in one statemant.. – user3100708 Dec 18 '14 at 09:16
  • @user3100708 You marked http://stackoverflow.com/a/27543087/4350148 as correct, but it is using statements not preparedstatements. It's especially important to use preparedstatements here. I recommend asking the poster of that answer to modify it using preparedstatements or uncheck acceptance. – dan b Dec 18 '14 at 11:06

2 Answers2

2

This is the efficient way for batch insert.

Connection connection = new getConnection();
Statement statement = connection.createStatement();
 
for (String query : queries) {
    statement.addBatch(query);
}
statement.executeBatch();
statement.close();
connection.close();
Divyang Shah
  • 1,578
  • 1
  • 11
  • 22
  • That what i was looking for But how many queries should i give him each time to have the best optimization. 500?1000?10000? in which way it will work in the best way? – user3100708 Dec 18 '14 at 09:25
  • you can decide it based on your requirements. try to make it as possible as less itterations – Divyang Shah Dec 18 '14 at 09:35
  • so what you are saying is that if i can send each time 10k queries it will be better than send 1k queries each time? – user3100708 Dec 18 '14 at 09:37
  • yes, exactly what you have understand. It will decrease no. of i/o itteration. But one thing that you need to take in mind is that it should not throw heap space excepion. – Divyang Shah Dec 18 '14 at 09:38
  • How this batch insert relate to heap space exception? i got this exception sometimes and never understood why. can you explain that? – user3100708 Dec 18 '14 at 09:40
  • heap space exception comes when you store more data than available space provided by jvm. for ex. you adding data in string builder with more data than space allocated by jvm. so, you need to make string builder free. so. other resources can use that memory. – Divyang Shah Dec 18 '14 at 09:43
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/67205/discussion-between-user3100708-and-divyang-shah). – user3100708 Dec 18 '14 at 09:46
  • Its especially important to use prepared statements in a case like this. I recommend changing your answer to use prepared statements – dan b Dec 18 '14 at 10:25
  • why do you recommend to use prepared statement? – Divyang Shah Dec 18 '14 at 11:36
0

Questions 1&2:

User Neil Coffey some time ago said:

The notion that prepared statements are primarily about performance is something of a misconception, although it's quite a common one.

Another poster mentioned that he noted a speed improvement of about 20% in Oracle and SQL Server. I've noted a similar figure with MySQL. It turns out that parsing the query just isn't such a significant part of the work involved. On a very busy database system, it's also not clear that query parsing will affect overall throughput: overall, it'll probably just be using up CPU time that would otherwise be idle while data was coming back from the disk.

So as a reason for using prepared statements, the protection against SQL injection attacks far outweighs the performance improvement. And if you're not worried about SQL injection attacks, you probably should be...

Here is the original post:
PreparedStatements and performance
and in my opinion all answers are worth reading. I think you expect PreparedStatement to be some kind of magician that would increase your inserting speed significantly and that is why you are disappointed with the improvement you get.

Question 3:
The proper way of using PreparedStatement is preparing a statement and then setting values and updating database in a loop. Here is a good example: Reusing a PreparedStatement multiple times

Community
  • 1
  • 1
spoko
  • 783
  • 1
  • 10
  • 24
  • My experience is that PreparedStatements significantly improve performance and it is a very important consideration when implementing systems.The benchmarks sited are simply anecdotal. My concern is that we not perpetuate the myth that prepared statements don't improve performance that much. Benchmarks for this would have to take into account, running complex queries and also running similar queries thousands or hundreds of thousands of times with different parameters. Each query would be added to the server query cache, filling it up to it's limit. – dan b Dec 18 '14 at 10:23
  • @danb I agree with reducing cache size & database load and I always prefer `PreparedStatement` to `Statement`. I'm not Neil Coffey and I don't want to speak for him, but please notice: two answers with highest vote count in thread I provided are kinda different. Also, my answer here got upvoted and downvoted. It looks like opinions on this topic vary and it is difficult to definitely claim who is right and who is wrong. As I said, I always use `PreparedStatement` as I was taught it works faster so I can't compare these two. I think trying both on user's environment would be the best benchmark. – spoko Dec 18 '14 at 10:42
  • Both answers downplay the role of performance improvements when using preparedstatements. My concern is that your answer will perpetuate the implication that preparedstatements do not significantly increase performance. Even without SQL injection I would use it for performance gains. In the question here, I'm not sure what value is being added. And I have built and tested many production systems and PreparedStatements performance improvements are a must! – dan b Dec 18 '14 at 10:50