0

I am busy writing an migration that will allow us to move our yamler from Syck to Psych and finally upgrade our project to ruby 2. This migration is going to be seriously resource intensive though so I am going to need to use chunking.

I wrote the following method to confirm that the result of the migration I plan to use produces the expected result and can be done without down time. To avoid Active record performing the serialization automatically I needed to use ActiveRecord::Base.connection.execute

My method that describes the transformation is as follows

 def show_summary(table, column_name)
  a = ActiveRecord::Base.connection.execute <<-SQL
   SELECT id, #{column_name} FROM #{table}
  SQL
  all_rows = a.to_a; ""
  problem_rows = all_rows.select do |row|
    original_string = Syck.dump(Syck.load(row[1]))
    orginal_object = Syck.load(original_string)

    new_string = Psych.dump(orginal_object)
    new_object = Syck.load(new_string)

    Syck.dump(new_object) != original_string rescue true
  end

problem_rows.map do |row|
  old_string = Syck.dump(Syck.load(row[1]))
  new_string = Psych.dump(Syck.load(old_string)) rescue "Parse failure"
  roundtrip_string = begin
    Syck.dump(Syck.load(new_string))
  rescue => e
    e.message
  end

  new_row = {}
  new_row[:id] = row[0]
  new_row[:original_encoding] = old_string
  new_row[:new_encoding] = roundtrip_string
  new_row
  end
end

How can you use batching when making use of ActiveRecord::Base.connection.execute ?

For completeness my update function is as follows

  # Migrate the given serialized YAML column from Syck to Psych
  # (if any).
  def migrate_to_psych(table, column)
    table_name = ActiveRecord::Base.connection.quote_table_name(table)

    column_name = ActiveRecord::Base.connection.quote_column_name(column)

    fetch_data(table_name, column_name).each do |row|
      transformed = ::Psych.dump(convert(Syck.load(row[column])))

      ActiveRecord::Base.connection.execute <<-SQL
         UPDATE #{table_name}
         SET #{column_name} = #{ActiveRecord::Base.connection.quote(transformed)}
         WHERE id = #{row['id']};
      SQL
    end
  end

  def fetch_data(table_name, column_name)
    ActiveRecord::Base.connection.select_all <<-SQL
       SELECT id, #{column_name}
       FROM #{table_name}
       WHERE #{column_name} LIKE '---%'
    SQL
  end

Which I got from http://fossies.org/linux/openproject/db/migrate/migration_utils/legacy_yamler.rb

Donovan Thomson
  • 2,375
  • 3
  • 17
  • 25

1 Answers1

8

You can easily build something with SQL's LIMIT and OFFSET clauses:

def fetch_data(table_name, column_name)
  batch_size, offset = 1000, 0
  begin
    batch = ActiveRecord::Base.connection.select_all <<-SQL
      SELECT id, #{column_name}
      FROM #{table_name}
      WHERE #{column_name} LIKE '---%'
      LIMIT #{batch_size} 
      OFFSET #{offset}
    SQL
    batch.each do |row|
      yield row
    end
    offset += batch_size
  end until batch.empty?
end

which you can use almost exactly the same as before, just without the .each:

fetch_data(table_name, column_name) do |row| ... end

HTH!

Raffael
  • 2,639
  • 16
  • 15
  • 1
    By the way, I always try to avoid using ActiveRecord models in migrations. Simply because the model's source code is subject to change but my migration might still be used a year from now to set up test databases. – Raffael Jan 14 '16 at 19:34
  • Thanks for your answer ! This is really helpful ! Ye I had some pain earlier this year on my project with models changing and then data migrations that used active record failing. – Donovan Thomson Jan 15 '16 at 06:32
  • 2
    This works nice, just one note [not use begin; end until syntax](https://stackoverflow.com/questions/136793/is-there-a-do-while-loop-in-ruby) – fguillen Aug 03 '17 at 14:57