I am trying to import a csv file to a MySQL database, and I created a script in groovy for doing that using Hibernate. The thing is, the csv file is quite huge, it has about 48 million lines. I was opening a session and committing a transaction for each line, but then I saw that this would take a long time. By the pace of rows added to the database after one day running the script, it would take more than one month to export everything. So, I thought that maybe, the bottleneck was at opening a session and committing a transaction for every line... then I decided to open just one session, read every line, instantiate an entity for every single line, save it on the session, and after reading all the 48 millions lines and storing the 48 millions entities in the session, commit one single transaction with all these instances. Well, that didn't work either. Is there any way of importing this csv file to a mysql database in a much faster way? Or it should really take that long?
Asked
Active
Viewed 8,162 times
2
-
do you mean importing? – Scary Wombat Sep 01 '14 at 08:15
-
is it "export FROM mysql" or "IMPORT to mysql"? – cfrick Sep 01 '14 at 08:15
-
I don't know how to do it with hibernate, but you can easily import a csv file to mysql with load data infile http://stackoverflow.com/questions/14127529/mysql-import-data-from-csv-using-load-data-infile – luanjot Sep 01 '14 at 08:18
-
Thanks, I meant importing – pedrobisp Sep 01 '14 at 10:31
-
It worked, took about 3 hours and 19 minutes to import 48 million lines. Thanks a lot everyone. – pedrobisp Sep 01 '14 at 14:17
3 Answers
4
I reccommend that you do not use Java and use LOAD DATA INFILE 'c:/tmp/discounts.csv' ...
see http://www.mysqltutorial.org/import-csv-file-mysql-table/

Scary Wombat
- 44,617
- 6
- 35
- 64
3
Hibernate in a script? you must be kidding! :)
I would stick with groovy.sql.Sql
, because it's simple and flexible to use and doesn't have hibernate's burden with mapping or session flushing.
your script might be looking as following:
Sql db = new Sql(...)
int flushCounter = 0
new File( 'my.csv' ).splitEachLine( /;/ ){ parts -> // or a full-blown csv-parser here
db.executeInsert 'insert into mytable (id,a,b,c) values (null,?,?,?)', parts
flushCounter++
if( 0 == flushCounter % 1000 ) db.commit() // flush each 1000 records
}
db.commit()
db.close()

injecteer
- 20,038
- 4
- 45
- 89
1
I found very interesting link with solution to your problem. Maybe it will be useful http://chriseiffel.com/everything-linux/how-to-import-a-large-csv-file-to-mysql/

mateusz.zbikowski
- 71
- 1
- 1
-
1Could you bring the content of the link into your answer to ensure future-proofing against dead links? Thanks – JustinJDavies Sep 01 '14 at 08:50
-