We're building a CSV export mechanism in a CMS we just built and we're having trouble keeping memory down on result sets with millions of rows. What is the best way to write a 1MM+ row result from MySQL (via the mysql2
gem) to CSV. If it makes any difference, we're on Ruby 1.9.3p194 and MySQL 5.5. Our main goal is to keep memory consumption down on the Ruby process.
Asked
Active
Viewed 2,705 times
3

giantcola
- 150
- 2
- 7
3 Answers
7
By default mysql2 caches the ruby objects that it creates for its result set. If you turn this off then memory consumption should be greatly reduced. for example you might do
require 'mysql2'
client = Mysql2::Client.new(:host => "localhost", :username => "someuser")
client.query 'use some_db'
CSV.open('output.csv', 'w') do |csv|
results = client.query("SELECT * FROM big_table")
results.each(:cache_rows => false) do |row|
csv << [row['column_a'], row['column_b']]
end
end

Frederick Cheung
- 83,189
- 8
- 152
- 174
-
I'm already passing stream: true when executing the command which according to the docs ignores cache_rows setting as streaming should force it to not cache rows.. Unfortunately this doesn't seem to be the case. – giantcola Jul 02 '12 at 18:48
-
1are you using master? the version currently on ruby gems (0.3.11) predates the :stream option. – Frederick Cheung Jul 02 '12 at 19:49
-
Ah, we are not using master. Thanks for bringing this to my attention! – giantcola Jul 02 '12 at 19:57
4

eggyal
- 122,705
- 18
- 212
- 237
-
Yes, this is the best solution. Look at the `FIELDS TERMINATED BY` field. It's tab separated by default so you'll have to set it to ','. Also, you can only write to the server's file system. Could be a deal breaker if you need to do this remotely. http://dev.mysql.com/doc/refman/5.0/en/select-into.html – Joshua Martell Jun 30 '12 at 00:56
-
Good to know this option exists, unfortunately it won't work in my current application as I have to do a little bit of processing on each result row before appending it to the CSV. – giantcola Jul 02 '12 at 18:49
-
-
@eggyal I have to run some of the SQL values through app-level logic before exporting to make the resulting CSV more human/excel-friendly. While it may be possible to run most if not all of the processing in SQL, I'd rather not tie my business logic into my query like that. – giantcola Jul 02 '12 at 19:09
-
0
If you don't want to use eggyal's OUTFILE
solution for some reason I would split the query into batches using LIMIT
and OFFSET
:
SELECT ... LIMIT 5000 OFFSET 5000
http://dev.mysql.com/doc/refman/5.0/en/select.html
The memory consumption will come from creating a large single ruby array from the big 1M+ query. By splitting it into smaller batches and dumping each batch in turn into the CSV file will help keep memory consumption lower.

Casper
- 33,403
- 4
- 84
- 79
-
It'd be pretty hard work on MySQL to scan the results in order to find each offset. Better to filter the data in batches based on some indexed column. – eggyal Jun 30 '12 at 00:49
-
@eggyal - Ok. Thanks. Wasn't aware of the performance penalty. I also found this while researching further about the performance: http://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down . Anyway it seems like `OUTFILE` will be much easier and faster. – Casper Jun 30 '12 at 00:57