49

I need to create a signature string for a variable in Ruby, where the variable can be a number, a string, a hash, or an array. The hash values and array elements can also be any of these types.

This string will be used to compare the values in a database (Mongo, in this case).

My first thought was to create an MD5 hash of a JSON encoded value, like so: (body is the variable referred to above)

def createsig(body)    
  Digest::MD5.hexdigest(JSON.generate(body))
end

This nearly works, but JSON.generate does not encode the keys of a hash in the same order each time, so createsig({:a=>'a',:b=>'b'}) does not always equal createsig({:b=>'b',:a=>'a'}).

What is the best way to create a signature string to fit this need?

Note: For the detail oriented among us, I know that you can't JSON.generate() a number or a string. In these cases, I would just call MD5.hexdigest() directly.

TelegramSam
  • 2,770
  • 1
  • 17
  • 22
  • 3
    If this will be used for any sort of security purposes, please don't use MD5. – Alan Jun 23 '11 at 23:33
  • 2
    It is not being used for security purposes, but as a simple comparison via string representation. I don't NEED md5, but it's the closest thing I could think of. – TelegramSam Jun 23 '11 at 23:37
  • 1
    Do you need these values to be the same within a single process or across processes? You could use `x.hash` (or a combination of `x.hash` and `x.class`) if you don't need them to be consistent across processes. – mu is too short Jun 23 '11 at 23:56
  • 1
    As mentioned in the question, I will be storing these values in a database for comparison. I need them to be portable between processes. The comparison needs to be made on the value of the variable, not the specific variable itself. – TelegramSam Jun 24 '11 at 04:12
  • Just to expand on Alan's comment, use bcrypt for security purposes. One way hashing with a time cost to prevent brute force attacks. – superluminary Oct 10 '12 at 16:28
  • Just wanted to note that in Ruby 1.9.3+ this should not be a problem. See: http://stackoverflow.com/questions/31850741/order-of-keys-in-a-json-object-converted-to-a-ruby-hash-with-json-parse?rq=1 – Joe Edgar Jun 29 '16 at 12:56

6 Answers6

34

I coding up the following pretty quickly and don't have time to really test it here at work, but it ought to do the job. Let me know if you find any issues with it and I'll take a look.

This should properly flatten out and sort the arrays and hashes, and you'd need to have to some pretty strange looking strings for there to be any collisions.

def createsig(body)
  Digest::MD5.hexdigest( sigflat body )
end

def sigflat(body)
  if body.class == Hash
    arr = []
    body.each do |key, value|
      arr << "#{sigflat key}=>#{sigflat value}"
    end
    body = arr
  end
  if body.class == Array
    str = ''
    body.map! do |value|
      sigflat value
    end.sort!.each do |value|
      str << value
    end
  end
  if body.class != String
    body = body.to_s << body.class.to_s
  end
  body
end

> sigflat({:a => {:b => 'b', :c => 'c'}, :d => 'd'}) == sigflat({:d => 'd', :a => {:c => 'c', :b => 'b'}})
=> true
Luke
  • 4,855
  • 1
  • 22
  • 18
  • The strings are not created equal: ruby-1.9.2-p180 :001 > a = {:aa=>"aa",:bb=>"bb"} => {:aa=>"aa", :bb=>"bb"} ruby-1.9.2-p180 :002 > b = {:bb=>"bb",:aa=>"aa"} => {:bb=>"bb", :aa=>"aa"} ruby-1.9.2-p180 :003 > a.inspect => "{:aa=>\"aa\", :bb=>\"bb\"}" ruby-1.9.2-p180 :004 > b.inspect => "{:bb=>\"bb\", :aa=>\"aa\"}" – TelegramSam Jun 24 '11 at 04:13
  • Changed answer to address the ordering issue. Let me know if you can think of any holes in it. – Luke Jun 24 '11 at 06:29
  • This only handles the top Hash, and doesn't address the same issue with Hashes deeper in the structure. Is there a way to get those as well? – TelegramSam Jun 24 '11 at 15:03
  • I see what you mean. I'll see what I can come up with and edit my answer. – Luke Jun 24 '11 at 18:51
  • @TelegramSam How does that look? – Luke Jun 24 '11 at 20:21
  • 4
    Warning: This mutates the original object graph – Joel Sep 27 '13 at 05:47
  • `object#inspect` is enough to get a string from any object type – brauliobo May 09 '19 at 13:04
16

If you could only get a string representation of body and not have the Ruby 1.8 hash come back with different orders from one time to the other, you could reliably hash that string representation. Let's get our hands dirty with some monkey patches:

require 'digest/md5'

class Object
  def md5key
    to_s
  end
end

class Array
  def md5key
    map(&:md5key).join
  end
end

class Hash
  def md5key
    sort.map(&:md5key).join
  end
end

Now any object (of the types mentioned in the question) respond to md5key by returning a reliable key to use for creating a checksum, so:

def createsig(o)
  Digest::MD5.hexdigest(o.md5key)
end

Example:

body = [
  {
    'bar' => [
      345,
      "baz",
    ],
    'qux' => 7,
  },
  "foo",
  123,
]
p body.md5key        # => "bar345bazqux7foo123"
p createsig(body)    # => "3a92036374de88118faf19483fe2572e"

Note: This hash representation does not encode the structure, only the concatenation of the values. Therefore ["a", "b", "c"] will hash the same as ["abc"].

Wayne Conrad
  • 103,207
  • 26
  • 155
  • 191
1

Here's my solution. I walk the data structure and build up a list of pieces that get joined into a single string. In order to ensure that the class types seen affect the hash, I inject a single unicode character that encodes basic type information along the way. (For example, we want ["1", "2", "3"].objsum != [1,2,3].objsum)

I did this as a refinement on Object, it's easily ported to a monkey patch. To use it just require the file and run "using ObjSum".

module ObjSum
  refine Object do
    def objsum
      parts = []
      queue = [self]

      while queue.size > 0
        item = queue.shift

        if item.kind_of?(Hash)
          parts << "\\000"
          item.keys.sort.each do |k| 
            queue << k
            queue << item[k]
          end
        elsif item.kind_of?(Set)
          parts << "\\001"
          item.to_a.sort.each { |i| queue << i }
        elsif item.kind_of?(Enumerable)
          parts << "\\002"
          item.each { |i| queue << i }
        elsif item.kind_of?(Fixnum)
          parts << "\\003"
          parts << item.to_s
        elsif item.kind_of?(Float)
          parts << "\\004"
          parts << item.to_s
        else
          parts << item.to_s
        end
      end

      Digest::MD5.hexdigest(parts.join)
    end
  end
end
Greg Fodor
  • 421
  • 4
  • 4
1

Just my 2 cents:

module Ext
  module Hash
    module InstanceMethods
      # Return a string suitable for generating content signature.
      # Signature image does not depend on order of keys.
      #
      #   {:a => 1, :b => 2}.signature_image == {:b => 2, :a => 1}.signature_image                  # => true
      #   {{:a => 1, :b => 2} => 3}.signature_image == {{:b => 2, :a => 1} => 3}.signature_image    # => true
      #   etc.
      #
      # NOTE: Signature images of identical content generated under different versions of Ruby are NOT GUARANTEED to be identical.
      def signature_image
        # Store normalized key-value pairs here.
        ar = []

        each do |k, v|
          ar << [
            k.is_a?(::Hash) ? k.signature_image : [k.class.to_s, k.inspect].join(":"),
            v.is_a?(::Hash) ? v.signature_image : [v.class.to_s, v.inspect].join(":"),
          ]
        end

        ar.sort.inspect
      end
    end
  end
end

class Hash    #:nodoc:
  include Ext::Hash::InstanceMethods
end
Alex Fortuna
  • 1,223
  • 12
  • 16
1

These days there is a formally defined method for canonicalizing JSON, for exactly this reason: https://datatracker.ietf.org/doc/html/draft-rundgren-json-canonicalization-scheme-16

There is a ruby implementation here: https://github.com/dryruby/json-canonicalization

-1

Depending on your needs, you could call ary.inspect or ary.to_yaml, even.

Kudu
  • 6,570
  • 8
  • 27
  • 27