0

I'm trying to compare two arrays of hashes, and delete old records. stale_records are records from old_records that don't exist in new_records. Items in the array can be duplicated.

old_records = [{a: 1}, {b: 2}]
new_records = [{a: 1}]
stale_records = #=> [{:b=>2}]

old_records = [{a: 1}, {a: 1}]
new_records = [{a: 1}]
stale_records #=> [{a: 1}]

I am wondering if there an efficient way for a few million records.

I tried:

old_records = [{a: 1}, {b: 2}]
new_records = [{a: 1}]
stale_records = old_records - new_records #=> [{:b=>2}]

old_records = [{a: 1}, {a: 1}]
new_records = [{a: 1}]
stale_records = old_records - new_records #=> []

which does not give the correct result when items are duplicated.

sawa
  • 165,429
  • 45
  • 277
  • 381
L457
  • 1,002
  • 1
  • 13
  • 33
  • 3
    It'd help us if you gave an example of "cleanly handle" :) if ruby worked the way you'd like it to, what would you have to write? – Taryn East Dec 04 '17 at 01:50
  • 1
    I guess efficient would be a better word. The way I initially thought of handling it was to loop through each element of one array and compare/delete it from the other separately rather than using subtraction. – L457 Dec 04 '17 at 03:22
  • Agree with above, what do you expect to be stored in stale_records? – grail Dec 04 '17 at 03:22
  • sorry, updated the question :) – L457 Dec 04 '17 at 03:28
  • Hmmm, when it comes to efficiencies over a few million records... I start thinking of "can I do this in the database itself"... have you looked into what SQL can offer you? – Taryn East Dec 04 '17 at 22:56

0 Answers0