5

My question is related to user defined function in set operations, but I think I can cut to the core of the problem:

How do I choose a specific hashing function? For example, if I want to do value-based matching rather than reference-matching, and I want to see whether a certain tuple is present (or simply delete it):

my %data := SetHash.new: (1, 2), (3, 4);
%data{$(1, 2)}:delete; # False

In C++ or C#, I could give a custom hashing/comparison function to the constructor. In C#, hashing by value will automatically happen if my data type is a struct (a value-type rather than a reference-type). Perl 6 does value-type hashing to some extent for Pair (if the Pair doesn't contain any containers), but I don't see how to make it work for any other complex type.

On one hand, I get why this isn't the safest operation--it's easy to define objects whose hash code can change after they've been inserted. But that hasn't stopped .NET and the C++ STL from allowing custom hashing.

A possible API usage (with the chained hashing logic inspired by this, originally from Boost) would be:

class MyHasher does Hasher of Array[Int] {
  method get-hash-value(Int @array) {
    reduce
      -> $a, $b {$a +^ ($b + 0x9e3779b97f4a7c16 + ($a +< 6) + ($a +> 2))},
      0,
      |@array;
  }
  method equals(Int @a, Int @b) { @a eqv @b; }
}

my %data := SetHash.new(
  my Int @=[1, 2], my Int @=[3, 4],
  :hasher(MyHasher.new)
);
say %data{$[1, 2]}; # should be True

And this would be the hasher role, which should be provided by the Perl 6's core library, if it doesn't already exist:

role Hasher[::T=Any] { method equals(T $a, T $b --> Bool) { ... }; method get-hash-value(T $obj) { ... } }

Solution: At present, the most reasonable solution is to override a class's .WHICH method, which serves as a hash value and is used for equality testing. I gave an example of a hash-key class which emulates a value-type here. It's almost as versatile as a custom hash function per hash object, since the key type can be declared when the hash is created. (This can't be done for Set since Set isn't parameterized.)

jjmerelo
  • 22,578
  • 8
  • 40
  • 86
piojo
  • 6,351
  • 1
  • 26
  • 36
  • 1
    Note that the constructor can't simply be overloaded, because the current design is that all arguments are turned into set elements, so `:hasher(hash-logic)` would get swallowed into the data structure. – piojo Dec 10 '17 at 09:32
  • 2
    FWIW, `Set.new` only takes positionals, and currently ignores named parameters: `dd Set.new(:foo) # Set.new()` – Elizabeth Mattijsen Dec 10 '17 at 14:16
  • 1
    Allowing for a custom Hasher function would be quite a lot of work as far as I can see at the moment, and would affect performance negatively in the general case. Also, I'm not entirely sure whether `.classify` (https://docs.perl6.org/routine/classify) with a mapper such as above, isn't really what you are after. Could it be? – Elizabeth Mattijsen Dec 10 '17 at 15:21
  • Thanks for the correction, @ElizabethMattijsen. I sometimes forget the difference between passing a pair and a named argument [which is a pair]. – piojo Dec 11 '17 at 02:42
  • @ElizabethMattijsen I need to play more with `List.classify`. It looks like the perfect back-end for the feature I'm describing, needing only a wrapper class to do collision resolution and provide typical hash methods. I especially like that `.classify` decontainerizes the keys. It's quite clean that the result is recursively keyed if the mapper function returns a list as the key. – piojo Dec 11 '17 at 02:54
  • 1
    Please note that `.classify` also has an `:into`' named parameter (which appears to be undocumented apparently), that allows `my %h = 97 => ["a"]; .classify( *.ord, :into(%h) ); dd %h # Hash %h = {"100" => $["d"], "97" => $["a"], "98" => $["b"], "99" => $["c"]}`. Which should make using `.classify` as a type of `Set` easier. – Elizabeth Mattijsen Dec 11 '17 at 21:10
  • @ElizabethMattijsen Thank you, that's also very helpful. That looks like the same functionality as `%hsh.classify-list`, which may be why it's not documented. – piojo Dec 12 '17 at 03:31
  • You could make a new class that does *Associative*, or even just subclass *Hash*. – Brad Gilbert May 07 '18 at 16:54

1 Answers1

1

The way a Hash works is you use a key to store a value, and use the exact same key to retrieve the value.

In the case of value types like Str and Int, you can have multiple instances which act as if they are the exact same value. So 42 and 40 + 2 act as if they are the exact same instance even if they aren't.

So this works:

my %h{Any}; # defaults to Str keys

%h{'abc'} = 42;

my ($a,$b,$c) = 'a', 'b', 'c';

say %h{"$a $b $c"}; # 42

%h{42} = 'The answer';

say %h{"42"}; # (Any)
say %h{42}; # The answer

There isn't really a facility to make several different values pretend to be the same value for just a Hash.

'abc' === 'cba'; # False

'abc'.WHICH eq 'cba'.WHICH; # basically how the above works

I think that what you are asking for is a feature which should not be added.

There is a WHICH method, which should only used to make two values identical everywhere throughout the language.

say 42.WHICH.perl;       # ValueObjAt.new("Int|42")
say (40 + 2).WHICH.perl; # ValueObjAt.new("Int|42")
42 === (40 + 2);         # True

say Hash.new.WHICH.perl; # ObjAt.new("Hash|94014087733456")
say Hash.new.WHICH.perl; # ObjAt.new("Hash|94014087735232")

Note that for the Hash.new that they don't match, because they are different instances that can change over time.

As an example where this is a good thing. Let's say you have two employees named 'Bob'.

my $a = Employee.new( name => 'Bob' );
my $b = Employee.new( name => 'Bob' );

my %salary{Employee};

%salary{$a} = 1200; # arbitrary number for an example
%salary{$b} = 2000;

Note that by overriding the WHICH method you could end up accidently giving Bob $a a raise.

Basically it is probably not a good idea to mess with .WHICH unless you know exactly what you are doing, and you have a really good reason to.


So you can't/shouldn't do that. At least not the way you are trying to do it.

Instead make a new Associative class which works the way you want.

role Custom-Str-Hasher {
  method hashed ( --> Str:D ){…}
}

class Custom-SetHash is SetHash {
  multi method AT-KEY ( Custom-Str-Hasher:D $key ) is rw {
    self.AT-KEY( $key.hashed() ); # call base class's method
  }
}


class Foo does Custom-Str-Hasher {
  has Str:D $.Str is required;

  # required by the Custom-Str-Hasher role
  method hashed ( --> Str:D ){
    $!Str.comb(/\w/).unique.sort.join;
    # 'b cb a' → 'abc' and 'aaababcccba' → 'abc'
  }
}

my $a = Foo.new(:Str('b cb a'));
my $b = Foo.new(:Str('aaababcccba'));

my %h is Custom-SetHash; # use a different class than the default

%h{$a} = True;
say %h{$b}; # True;

put $a; # b cb a
put $b; # aaababcccba

Note that the above is just a quick example, there is a bunch of things I would change for a real example. For one, %h{'abc'} would also return True because of the way I implemented the AT-KEY method. It is also missing a bunch of methods like ASSIGN-KEY and DELETE-KEY.

Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129
  • Thanks. However, I urge you to reconsider the idea that this is weird or risky. I grant you that the STL is not the nicest API. But C# is really quite a lovely language, and it allows custom hashing. It's really necessary for treating objects as value-types. In the case where you are using names rather than employees, `Name.new(name => 'Bob')` really is semantically identical to `Name.new(name => 'Bob')`, though they can become different if one of them has a value change. Perhaps this concept maps poorly to Perl 6. I agree that redefining `WHICH` is not ideal in a big program. – piojo Jan 01 '19 at 14:32
  • @piojo It's probably fine in C#, but in Perl 6 it is risky. I am saying that the two Employee objects have identical information, but they could be referring to two different people. Also what if you change one of the Bobs to be Robert? Now you can't get at the proper data anymore using his object if you changed `.WHICH` to be based on the name. Changing `.WHICH` has huge impacts on how Perl 6 works, the feature in C# has minimal impact. If you were going to create a `.WHICH` for Employee, I would base it on employee identification number which shouldn't ever change or have a data collision. – Brad Gilbert Jan 01 '19 at 17:01