0

I've got to add like 25000 records to database at once in Rails. I have to validate them, too.

Here is what i have for now:

  # controller create action
  def create
    emails = params[:emails][:list].split("\r\n")
    @created_count = 0
    @rejected_count = 0

    inserts = []
    emails.each do |email|
      @email = Email.new(:email => email)
      if @email.valid?
        @created_count += 1
        inserts.push "('#{email}', '#{Date.today}', '#{Date.today}')"
      else
        @rejected_count += 1
      end
    end
    return if emails.empty?
    sql = "INSERT INTO `emails` (`email`, `updated_at`, `created_at`) VALUES #{inserts.join(", ")}"
    Email.connection.execute(sql) unless inserts.empty?
    redirect_to new_email_path, :notice => "Successfuly created #{@created_count} emails, rejected #{@rejected_count}"
  end

It's VERY slow now, no way to add such number of records 'cause of timeout.

Any ideas? I'm using mysql.

methyl
  • 3,272
  • 2
  • 25
  • 34

3 Answers3

2

Three things come into mind:

  1. You can help yourself with proper tools like: zdennis/activerecord-import or jsuchal/activerecord-fast-import. The problem is with, your example, that you will also create 25000 objects. If you tell activerecord-import to not use validations, it will not create new objects (activerecord-import/wiki/Benchmarks)
  2. Importing tens thousands of rows into relational database will never be super fast, it should be done asynchronously via background process. And there are also tools for that, like DelayedJob and more: https://www.ruby-toolbox.com/
  3. Move the code that belongs to model out of controller(TM)

And after that, you need to rethink the flow of this part of application. If you're using background processing inside a controller action like create, you can not just simply return HTTP 201, or HTTP 200. What you need to do is to return "quick" HTTP 202 Accepted, and provide a link to another representation where user could check the status of their request (do we already have success response? how many emails failed?), as it is in now beeing processed in the background. It can sound a bit complicated, and it is, which is a sign, that you maybe shouldn't do it like that. Why do you have to add like 25000 records in one request? What's the backgorund?

Ernest
  • 8,701
  • 5
  • 40
  • 51
  • 1
    I'd add, how many indices do you have, are they all necessary? – CambridgeMike May 02 '12 at 20:01
  • It's very simple site. One form where you put list of emails, and they are added to DB if they are unique. Another view with list of emails and link to download whole database. Maybe NoSQL would help here? – methyl May 03 '12 at 07:44
0

Why don't you create a rake task for the work? The following link explains it pretty well.

http://www.ultrasaurus.com/sarahblog/2009/12/creating-a-custom-rake-task/

In a nutshell, once you write your rake task, you can kick off the work by:

rake member:load_emails

Yosep Kim
  • 2,931
  • 22
  • 23
0

If speed is your concern, I'd attack the problem from a different angle.

  • Create a table that copies the structure of your emails table; let it be emails_copy. Don't copy indexes and constraints.
  • Import the 25k records into it using your database's fast import tools. Consult your DB docs or see e.g. this answer for MySQL. You will have to prepare the input file, but it's way faster to do — I suppose you already have the data in some text or tabular form.
  • Create indexes and constraints for emails_copy to mimic emails table. Constraint violations, if any, will surface; fix them.
  • Validate the data inside the table. It may take a few raw SQL statements to check for severe errors. You don't have to validate emails for anything but very simple format anyway. Maybe all your validation could be done against the text you'll use for import.
  • insert into emails select * from emails_copy to put the emails into the production table. Well, you might play a bit with it to get autoincrement IDs right.
  • Once you're positive that the process succeeded, drop table emails_copy.
Community
  • 1
  • 1
9000
  • 39,899
  • 9
  • 66
  • 104