129

I am trying to compare two Ruby Hashes using the following code:

#!/usr/bin/env ruby

require "yaml"
require "active_support"

file1 = YAML::load(File.open('./en_20110207.yml'))
file2 = YAML::load(File.open('./locales/en.yml'))

arr = []

file1.select { |k,v|
  file2.select { |k2, v2|
    arr << "#{v2}" if "#{v}" != "#{v2}"
  }
}

puts arr

The output to the screen is the full file from file2. I know for a fact that the files are different, but the script doesn't seem to pick it up.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
dennismonsewicz
  • 25,132
  • 33
  • 116
  • 189

14 Answers14

187

You can compare hashes directly for equality:

hash1 = {'a' => 1, 'b' => 2}
hash2 = {'a' => 1, 'b' => 2}
hash3 = {'a' => 1, 'b' => 2, 'c' => 3}

hash1 == hash2 # => true
hash1 == hash3 # => false

hash1.to_a == hash2.to_a # => true
hash1.to_a == hash3.to_a # => false


You can convert the hashes to arrays, then get their difference:

hash3.to_a - hash1.to_a # => [["c", 3]]

if (hash3.size > hash1.size)
  difference = hash3.to_a - hash1.to_a
else
  difference = hash1.to_a - hash3.to_a
end
Hash[*difference.flatten] # => {"c"=>3}

Simplifying further:

Assigning difference via a ternary structure:

  difference = (hash3.size > hash1.size) \
                ? hash3.to_a - hash1.to_a \
                : hash1.to_a - hash3.to_a
=> [["c", 3]]
  Hash[*difference.flatten] 
=> {"c"=>3}

Doing it all in one operation and getting rid of the difference variable:

  Hash[*(
  (hash3.size > hash1.size)    \
      ? hash3.to_a - hash1.to_a \
      : hash1.to_a - hash3.to_a
  ).flatten] 
=> {"c"=>3}
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • 4
    Is there anyway to get the differences between the two? – dennismonsewicz Feb 08 '11 at 01:57
  • 5
    Hashes can be of same size, but contain different values. In such case Both `hash1.to_a - hash3.to_a` and `hash3.to_a - hash1.to_a` may return nonempty values though `hash1.size == hash3.size`. The part after **EDIT** is valid only if hashes are of different size. – ohaleck Oct 16 '14 at 19:26
  • 3
    Nice, but should have quit while ahead. A.size > B.size doesn't necessarily mean A includes B. Still need to take the union of symmetric differences. – Gene Mar 05 '15 at 03:20
  • Directly comparing the output of `.to_a` will fail when equal hashes have keys in a different order: `{a:1, b:2} == {b:2, a:1}` => true, `{a:1, b:2}.to_a == {b:2, a:1}.to_a` => false – aidan Jan 27 '17 at 05:53
  • what's the purpose of `flatten` and `*`? Why not just `Hash[A.to_a - B.to_a]`? – JeremyKun Feb 22 '17 at 01:41
  • or `difference.to_h` – Chen Kinnrot Sep 24 '17 at 09:42
  • @ohaleck You are right! That's why I prefer to use: `hash1.to_a - hash2.to_a | hash2.to_a - hash1.to_a`. Please take a look at my answer => https://stackoverflow.com/questions/4928789/how-do-i-compare-two-hashes/57862282#57862282 – Victor Sep 09 '19 at 23:34
40

You can try the hashdiff gem, which allows deep comparison of hashes and arrays in the hash.

The following is an example:

a = {a:{x:2, y:3, z:4}, b:{x:3, z:45}}
b = {a:{y:3}, b:{y:3, z:30}}

diff = HashDiff.diff(a, b)
diff.should == [['-', 'a.x', 2], ['-', 'a.z', 4], ['-', 'b.x', 3], ['~', 'b.z', 45, 30], ['+', 'b.y', 3]]
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
liu fengyun
  • 401
  • 4
  • 2
  • 4
    I had some fairly deep hashes causing test failures. By replacing the `got_hash.should eql expected_hash` with `HashDiff.diff(got_hash, expected_hash).should eql []` I now get output which shows exactly what I need. Perfect! – davetapley Jul 24 '12 at 19:29
  • Wow, HashDiff is awesome. Made quick work of trying to see what has changed in a huge nested JSON array. Thanks! – Jeff Wigal Oct 28 '14 at 16:32
  • Your gem is awesome! Super helpful when writing specs involving JSON manipulations. Thx. – Alain Jun 23 '15 at 18:31
  • 2
    My experience with HashDiff has been that it works really well for small hashes but the diff speed doesn't seem to scale well. Worth benchmarking your calls to it if you expect it may get fed two large hashes and making sure that the diff time is within your tolerance. – David Bodow Jul 18 '18 at 23:04
  • Using the `use_lcs: false` flag can significantly speed up comparisons on large hashes: `Hashdiff.diff(b, a, use_lcs: false)` – Eric Walker May 07 '20 at 13:59
  • For anyone (like me) who might get tripped up by this, it should (now?) be Hashdiff, not HashDiff. – Travis Kriplean Aug 08 '22 at 15:27
21

If you want to get what is the difference between two hashes, you can do this:

h1 = {:a => 20, :b => 10, :c => 44}
h2 = {:a => 2, :b => 10, :c => "44"}
result = {}
h1.each {|k, v| result[k] = h2[k] if h2[k] != v }
p result #=> {:a => 2, :c => "44"}
Guilherme Bernal
  • 8,183
  • 25
  • 43
11

Rails is deprecating the diff method.

For a quick one-liner:

hash1.to_s == hash2.to_s
Evan
  • 7,396
  • 4
  • 32
  • 31
10

You could use a simple array intersection, this way you can know what differs in each hash.

    hash1 = { a: 1 , b: 2 }
    hash2 = { a: 2 , b: 2 }

    overlapping_elements = hash1.to_a & hash2.to_a

    exclusive_elements_from_hash1 = hash1.to_a - overlapping_elements
    exclusive_elements_from_hash2 = hash2.to_a - overlapping_elements
ErvalhouS
  • 4,178
  • 1
  • 22
  • 38
2

I developed this to compare if two hashes are equal

def hash_equal?(hash1, hash2)
  array1 = hash1.to_a
  array2 = hash2.to_a
  (array1 - array2 | array2 - array1) == []
end

The usage:

> hash_equal?({a: 4}, {a: 4})
=> true
> hash_equal?({a: 4}, {b: 4})
=> false

> hash_equal?({a: {b: 3}}, {a: {b: 3}})
=> true
> hash_equal?({a: {b: 3}}, {a: {b: 4}})
=> false

> hash_equal?({a: {b: {c: {d: {e: {f: {g: {h: 1}}}}}}}}, {a: {b: {c: {d: {e: {f: {g: {h: 1}}}}}}}})
=> true
> hash_equal?({a: {b: {c: {d: {e: {f: {g: {marino: 1}}}}}}}}, {a: {b: {c: {d: {e: {f: {g: {h: 2}}}}}}}})
=> false
Victor
  • 1,904
  • 18
  • 18
2

Here is algorithm to deeply compare two Hashes, which also will compare nested Arrays:

    HashDiff.new(
      {val: 1, nested: [{a:1}, {b: [1, 2]}] },
      {val: 2, nested: [{a:1}, {b: [1]}] }
    ).report
# Output:
val:
- 1
+ 2
nested > 1 > b > 1:
- 2

Implementation:

class HashDiff

  attr_reader :left, :right

  def initialize(left, right, config = {}, path = nil)
    @left  = left
    @right = right
    @config = config
    @path = path
    @conformity = 0
  end

  def conformity
    find_differences
    @conformity
  end

  def report
    @config[:report] = true
    find_differences
  end

  def find_differences
    if hash?(left) && hash?(right)
      compare_hashes_keys
    elsif left.is_a?(Array) && right.is_a?(Array)
      compare_arrays
    else
      report_diff
    end
  end

  def compare_hashes_keys
    combined_keys.each do |key|
      l = value_with_default(left, key)
      r = value_with_default(right, key)
      if l == r
        @conformity += 100
      else
        compare_sub_items l, r, key
      end
    end
  end

  private

  def compare_sub_items(l, r, key)
    diff = self.class.new(l, r, @config, path(key))
    @conformity += diff.conformity
  end

  def report_diff
    return unless @config[:report]

    puts "#{@path}:"
    puts "- #{left}" unless left == NO_VALUE
    puts "+ #{right}" unless right == NO_VALUE
  end

  def combined_keys
    (left.keys + right.keys).uniq
  end

  def hash?(value)
    value.is_a?(Hash)
  end

  def compare_arrays
    l, r = left.clone, right.clone
    l.each_with_index do |l_item, l_index|
      max_item_index = nil
      max_conformity = 0
      r.each_with_index do |r_item, i|
        if l_item == r_item
          @conformity += 1
          r[i] = TAKEN
          break
        end

        diff = self.class.new(l_item, r_item, {})
        c = diff.conformity
        if c > max_conformity
          max_conformity = c
          max_item_index = i
        end
      end or next

      if max_item_index
        key = l_index == max_item_index ? l_index : "#{l_index}/#{max_item_index}"
        compare_sub_items l_item, r[max_item_index], key
        r[max_item_index] = TAKEN
      else
        compare_sub_items l_item, NO_VALUE, l_index
      end
    end

    r.each_with_index do |item, index|
      compare_sub_items NO_VALUE, item, index unless item == TAKEN
    end
  end

  def path(key)
    p = "#{@path} > " if @path
    "#{p}#{key}"
  end

  def value_with_default(obj, key)
    obj.fetch(key, NO_VALUE)
  end

  module NO_VALUE; end
  module TAKEN; end

end

Daniel Garmoshka
  • 5,849
  • 39
  • 40
1

If you need a quick and dirty diff between hashes which correctly supports nil in values you can use something like

def diff(one, other)
  (one.keys + other.keys).uniq.inject({}) do |memo, key|
    unless one.key?(key) && other.key?(key) && one[key] == other[key]
      memo[key] = [one.key?(key) ? one[key] : :_no_key, other.key?(key) ? other[key] : :_no_key]
    end
    memo
  end
end
Ev Dolzhenko
  • 6,100
  • 5
  • 38
  • 30
1

If you want a nicely formatted diff, you can do this:

# Gemfile
gem 'awesome_print' # or gem install awesome_print

And in your code:

require 'ap'

def my_diff(a, b)
  as = a.ai(plain: true).split("\n").map(&:strip)
  bs = b.ai(plain: true).split("\n").map(&:strip)
  ((as - bs) + (bs - as)).join("\n")
end

puts my_diff({foo: :bar, nested: {val1: 1, val2: 2}, end: :v},
             {foo: :bar, n2: {nested: {val1: 1, val2: 3}}, end: :v})

The idea is to use awesome print to format, and diff the output. The diff won't be exact, but it is useful for debugging purposes.

Benjamin Crouzier
  • 40,265
  • 44
  • 171
  • 236
1

... and now in module form to be applied to a variety of collection classes (Hash among them). It's not a deep inspection, but it's simple.

# Enable "diffing" and two-way transformations between collection objects
module Diffable
  # Calculates the changes required to transform self to the given collection.
  # @param b [Enumerable] The other collection object
  # @return [Array] The Diff: A two-element change set representing items to exclude and items to include
  def diff( b )
    a, b = to_a, b.to_a
    [a - b, b - a]
  end

  # Consume return value of Diffable#diff to produce a collection equal to the one used to produce the given diff.
  # @param to_drop [Enumerable] items to exclude from the target collection
  # @param to_add  [Enumerable] items to include in the target collection
  # @return [Array] New transformed collection equal to the one used to create the given change set
  def apply_diff( to_drop, to_add )
    to_a - to_drop + to_add
  end
end

if __FILE__ == $0
  # Demo: Hashes with overlapping keys and somewhat random values.
  Hash.send :include, Diffable
  rng = Random.new
  a = (:a..:q).to_a.reduce(Hash[]){|h,k| h.merge! Hash[k, rng.rand(2)] }
  b = (:i..:z).to_a.reduce(Hash[]){|h,k| h.merge! Hash[k, rng.rand(2)] }
  raise unless a == Hash[ b.apply_diff(*b.diff(a)) ] # change b to a
  raise unless b == Hash[ a.apply_diff(*a.diff(b)) ] # change a to b
  raise unless a == Hash[ a.apply_diff(*a.diff(a)) ] # change a to a
  raise unless b == Hash[ b.apply_diff(*b.diff(b)) ] # change b to b
end
Iron Savior
  • 4,238
  • 3
  • 25
  • 30
0

what about convert both hash to_json and compare as string? but keeping in mind that

require "json"
h1 = {a: 20}
h2 = {a: "20"}

h1.to_json==h1.to_json
=> true
h1.to_json==h2.to_json
=> false
stbnrivas
  • 633
  • 7
  • 9
0

In my case I wanted to have the attributes merged like { status: [:collecting, :out_for_delivery] } so I did:

    before = attributes.without(*IGNORED_ATTRIBUTES)
    after = replacement.attributes
    diff = before.map do |key, _|
      [key, [before[key], after[key]]] if before[key] != after[key]
    end
    diff.compact.to_h
Dorian
  • 7,749
  • 4
  • 38
  • 57
0

This was answered in "Comparing ruby hashes". Rails adds a diff method to hashes. It works well.

Community
  • 1
  • 1
Wolfram Arnold
  • 7,159
  • 5
  • 44
  • 64
  • 7
    [Diff method](http://apidock.com/rails/Hash/diff) is deprecated starting from Rails versions newer than v4.0.2. – Andres Apr 24 '15 at 11:53
-5

How about another, simpler approach:

require 'fileutils'
FileUtils.cmp(file1, file2)
Mike
  • 19,267
  • 11
  • 56
  • 72
  • 4
    That only is meaningful if you need the hashes to be identical on the disk. Two files that are different on disk because the hash elements are in different orders, can still contain the same elements, and will be equal as far as Ruby is concerned once they are loaded. – the Tin Man Dec 27 '11 at 05:55