0

If there are multiple records on a specific date I want to remove all but latest record on that day. For example in below table records with id 9, 10, 12 have same date. So 9 and 10 should be removed as record with id 12 is having latest date.

id      date
1   2012-04-25 00:00:00.000000
2   2012-04-26 00:00:00.000000
3   2012-04-23 00:00:00.000000
4   2012-04-24 00:00:00.000000
5   2012-05-01 00:00:00.000000
6   2012-05-02 00:00:00.000000
7   2012-05-03 00:00:00.000000
8   2012-05-04 00:00:00.000000
9   2012-04-30 00:30:00.000000
10  2012-04-30 18:00:00.000000
11  2012-04-29 00:00:00.000000
12  2012-04-30 18:40:00.000000
13  2012-05-05 00:00:00.000000
14  2012-05-05 09:31:31.000000

Here is (dirty)rake task to remove duplicates

task :remove_duplicate do
  Rake::Task["remove_duplicate"].invoke
end

task :remove_duplicate => :environment do
  weights = Weight.count(:group => "DATE(date)", :having => "COUNT(id) > 1")
  weights_to_delete = []
  weights.each do |weight|

    start_date = weight[0].to_date.beginning_of_day
    end_date = weight[0].to_date.end_of_day
    day_weights = Weight.where("date >= ? and date <= ?", start_date, end_date).order(:date)
    day_weights[0..-2].each do |weight|
      weights_to_delete.push weight.id
    end
  end
  Weight.delete(weights_to_delete)
end

Though I am able to remove records as I explained, I am not satisfied with the approach I take. Please guide me to remove duplicate records on specific date keeping latest one only utilizing ActiveRecord API better way.

Thanks, Amit Patel

Amit Patel
  • 15,609
  • 18
  • 68
  • 106

3 Answers3

4

This method could be slow so I don't recommend it unless you're running it periodically.

Weight.all.each do |weight|
  Weight.order("id desc").where(date: weight.date).all.drop(1).each { |w| w.delete }
end
bricker
  • 8,911
  • 2
  • 44
  • 54
  • Slow and steady but it did the job, and when used as a one time operation I prefer to read clearly understandable code before speed. – lime Jul 31 '13 at 16:10
0

Try this:

latest_daily_weights = (Weight.maximum :date, :group => 'DATE(date)').values
weights_table = Arel::Table.new(:weights)
earlier_daily_weights = Weight.where(weights_table[:date].not_in latest_daily_weights)
earlier_daily_weights.delete_all

Credit:

How to exclude an array of ids from query in Rails (using ActiveRecord)?

Community
  • 1
  • 1
Steve Rowley
  • 1,548
  • 1
  • 11
  • 18
0

You can try this sql query, to remove records of same date but latest one on that date

DELETE FROM weights USING weights weight WHERE (CAST(weights.date as Date) = CAST(weight.date as Date) AND weights.id < weight.id);