2

I've two strings StringA, StringB. I want to generate a unique string to denote this pair.

i.e.

f(x, y) should be unique for every x, y and f(x, y) = f(y, x) where x, y are strings.

Any ideas?

NLV
  • 21,141
  • 40
  • 118
  • 183
  • Can you give an example, please? And what did you try so far? – Martin Hennings Feb 24 '11 at 09:03
  • I'm not a build a logic to get a unique value like that. I want f('ABC', 'DEF') = 'someuniquevalue' = f('DEF', 'ABC') and I want to define f now. – NLV Feb 24 '11 at 09:05
  • do you plan to use this `someuniquevalue` outside of .NET (i.e. as database PK)? Otherwise, I think my answer is the most simple and straight forward. – HuBeZa Feb 24 '11 at 09:21

9 Answers9

5

Compute a message digest of both strings and XOR the values

MD5(x) ^ MD5(Y)

The message digest gives you unique value for each string and the XOR makes it possible for f(x, y) to be equal to f(y, x).

EDIT: As @Phil H observed, you have to treat the case in which you receive two equal strings as input, which would generate 0 after the XOR. You could return something like an MD5(x+y) if x and y are the same, and MD5(x) ^ MD5(y) for the rest of values.

  • This is just reasonably unique, not really unique, as any hashing algorhytm, by definition, can produce the same output for different inputs. – SWeko Feb 24 '11 at 09:12
  • The only problem is that MD5(x)^MD5(x) = 0 - so you get the same result for any pair of matching strings. – Phil H Feb 24 '11 at 09:17
  • @Phil H: You are correct! f(x, x) and f(y, y) should be treated as special cases. Thanks for spotting that. –  Feb 24 '11 at 09:23
  • 1
    @SWeko : MD5 generates 128 bits so there can be the same result for different string. I just used them for simplicity (everybody knows what MD5 does). A stronger message digest algorithm can be used if needed –  Feb 24 '11 at 09:28
3

Just create a new class and override Equals & GetHashCode:

class StringTuple
{
    public string StringA { get; set; }
    public string StringB { get; set; }

    public override bool Equals(object obj)
    {
        var stringTuple = obj as StringTuple;
        if (stringTuple == null)
            return false;

        return (StringA.Equals(stringTuple.StringA) && StringB.Equals(stringTuple.StringB)) ||
            (StringA.Equals(stringTuple.StringB) && StringB.Equals(stringTuple.StringA));
    }

    public override int GetHashCode()
    {
        // Order of operands is irrelevant when using *
        return StringA.GetHashCode() * StringB.GetHashCode();
    }
}
HuBeZa
  • 4,715
  • 3
  • 36
  • 58
  • `*` is usually a bad choice for combining multiple hashcodes - if one string's code is 0, it doesn't matter what the other string is, so all such pairs collide. In addition, you'd also need to wrap this in an `unchecked` block, to prevent overflows. See e.g. the answers to this question: http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode – Damien_The_Unbeliever Feb 24 '11 at 09:32
  • @Damien_The_Unbeliever: in general, you are right, but: (1)string hash method is not likely to produce 0. (2)You must use an order agnostic operator like + or * (Do you have a better one?). (3)Of curse you need to preform it as `unchecked`, and check for null references (after all string is an object) and use a better name than `StringA`, but I would like to leave some work for NLV. Otherwise, what should he do with the rest of his day :) – HuBeZa Feb 24 '11 at 09:45
  • I realized there is no perfect solution for generating the unique string for a pair and all the solutions provided here are close to perfect. As Sweko suggested this is not a good solution(approach) to ANY problem I've changed the logic of the solution to my problem to avoid this :). – NLV Feb 24 '11 at 11:05
2

Just find a unique way of ordering them and concatenate with a separator.

def uniqueStr(strA,strB,sep):
    if strA <= strB:
        return strA+sep+strB
    else:
        return strB+sep+strA

For arbitrarily long lists of strings, either sort the list or generate a set, then concatenate with a separator:

def uniqueStr(sep,strList):
    return sep.join(Set(strList));

Preferably, if the strings are long or the separator choice is a problem, use the hashes and hash the result:

def uniqueStr(sep,strList):
    return hash(''.join([hash(str) for str in Set(strList)]))
Phil H
  • 19,928
  • 7
  • 68
  • 105
1

I think the following should yield unique strings:

String f = Replace(StringA<StringB?StringA:StringB,"@","@@") + "}@{" + Replace(StringA<StringB?StringB:StringA,"@","@@")

(That is, there's only one place in the string where a single "@" sign can appear, and we don't have to worry about a run of "@"s at the end of StringA being confused with a run of "@"s at the start of StringB.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
0

What about StringC = StringA + StringB;.

That is guaranteed to be unique for any combination of StringA or StringB. Or did you have some other considerations for the string also?

You can for example combine the strings and take the MD5 hash of it. Then you will get a string that is probably "unique enough" for your needs, but you cannot reverse the hash back into the strings again, but you can take the same strings and be sure that the generated hash will be the same the next time.

EDIT

I saw your edit now, but I feel it's only a matter of sorting the strings first in that case. So something like

StringC = StringA.CompareTo(StringB) < 0 ? StringA + StringB : StringB + StringA;
Øyvind Bråthen
  • 59,338
  • 27
  • 124
  • 151
0

You could just sort them and concatenate them, along with, lets, say the lenght of the first word.

That way f("one","two") = "onetwo3", f("two","one") = "onetwo3", and no other combination would produce that unique string as , e,g, "onet", "wo" would yield "onetwo4"

However, this will be a abysmal solution for reasonably long strings.

You could also do some sort of hash code calculcation, like this

first.GetHashCode() ^ second.GetHashCode()

that would be reasonably unique, however, you can't guarantee uniqueness.

It would be nice if the OP provided a little more context, because this does not sound like a sound solution to any problem.

SWeko
  • 30,434
  • 10
  • 71
  • 106
0

You can use x.GetHashCode(). That not ensures that this will be unique, but quite. See more information in this question.

For example:

public int GetUniqueValue(string x, string y)
{
    unchecked {
        var result = x.GetHashCode() * x.GetHashCode();
        return result;
    }
}
Community
  • 1
  • 1
Marc Climent
  • 9,434
  • 2
  • 50
  • 55
0

Well take into consideration the first letter of each string before combining them? So if it is alphabetically ordered f(x, y) = f(y, x) will be true.

if(x > y) c = x + y; else c = y + x;

Alex Hope O'Connor
  • 9,354
  • 22
  • 69
  • 112
0
public static String getUniqString(String x,String y){
    return (x.compareTo(y)<0)?(x+y):(y+x);
}
Jessu
  • 2,069
  • 1
  • 15
  • 19
  • Sorry! I haven't read all answers. Unfortunately my answer is copy of @Phil H's answer. – Jessu Feb 24 '11 at 09:26