1

Related: Generate a unique string based on a pair of strings

I want to generate an intuitive unique string to denote an ordered pair of strings.

Clearly, stringA + stringB is very intuitive but not unique if you consider, for example, "st" + "ring" == "stri" + "ng" == "string".

Also, unlike the linked OP, I'd like to have uniqueString(stringA, stringB) != uniqueString(stringB, stringA), i.e. noncommutative. Something like MD5(stringA) - MD5(stringB) might work considering the linked OP, but I feel it's very unintuitive.

Any ideas?

akai
  • 2,498
  • 4
  • 24
  • 46

3 Answers3

1

If tasked with such a problem, I would try a CSV-like approach, e.g.

  • stringA + stringB => stringA;stringB

  • stringA + string;B => stringA;"string;B"

  • stringA + string"B => stringA;"string""B"

Locoluis
  • 784
  • 6
  • 8
  • Looks nice, but implementing this by yourself sounds super error prone. Is there any reference implementation or a function like csvConcat in any language? – akai Jun 12 '17 at 16:48
  • This seems a lot more complex and confusing than just using a single escape character, which you also escape, e.g `,` -> `\,` and `\ ` -> `\\ ` (as shown in dornadigital's answer). – Bernhard Barker Jun 12 '17 at 18:14
1

Encode the length of the first string into the resulting string; that way, you know where the split is, and "xy" + "z" is different from "x" + "yz".
Zero-pad the length, so that it always has the same number of digits (depending on the maximum length of the strings).

Examples (with a maximum string length of 999):

"xxx" + "yyy" = "003xxxyyy"  
"xx" + "xyyy" = "002xxxyyy"
"xxxyyy" + "" = "006xxxyyy"  
"" + "xxxyyy" = "000xxxyyy"  
"" + ""       = "000"

Alternatively, if the maximum length of the string is unknown, you could use a delimiter after the length:

"xxx" + "yyy" = "3;xxxyyy"  

You don't have to use a special character for this, or escape the delimiter in the strings, because there is no ambiguity:

"a;b" + ";c;" = "3;a;b;c;" = length + delimiter + "a;b;c;"
0

This feels very much like a serialization issue... Put two values in the same place and still be able to separate them afterwards.

One of the simplest ways is to have a delimiter à la csvs, though that would require you to implement a unique character or sequence of characters.

Removing this issue would be as simple as adding a '\' before all instances of that delimiter in your string along with all instances of '\'.

As an example:

"hello, " + "wor\d"
"hello\, " + "wor\\d" //Add in the escape characters
"hello\, ,wor\\d" //Second comma is not escaped, parser knows to split the string back into two components there
dornadigital
  • 167
  • 13