0

I'm working on a simple game and I have the requirement of taking a word or phrase such as "hello world" and converting it to a series of numbers.

The criteria is:

  1. Numbers need to be distinct
  2. Need ability to configure maximum sequence of numbers. IE 10 numbers total.
  3. Need ability to configure max range for each number in sequence.
  4. Must be deterministic, that is we should get the same sequence everytime for the same input phrase.

I've tried breaking down the problem like so:

  1. Convert characters to ASCII number code: "hello world" = 104 101 108 108 111 32 119 111 114 108 100
  2. Remove everyother number until we satisfy total numbers (10 in this case)
  3. Foreach number if number > max number then divide by 2 until number <= max number
  4. If any numbers are duplicated increase or decrease the first occurence until satisfied. (This could cause a problem as you could create a duplicate by solving another duplicate)

Is there a better way of doing this or am I on the right track? As stated above I think I may run into issues with removing distinction.

The Muffin Man
  • 19,585
  • 30
  • 119
  • 191
  • Are there any requirements that different words/phrases generate different number series? – Ted Hopp Feb 21 '13 at 17:13
  • @TedHopp I'm sorry, I'm not sure I understand your question fully. – The Muffin Man Feb 21 '13 at 17:15
  • "What's the best algorithm for X" seems like a better fit for PE than SO. – P.Brian.Mackey Feb 21 '13 at 17:15
  • Seems underspecified, since returning a one constant number for any phrase satisfies the stated criteria. What's the use case for this? – bmm6o Feb 21 '13 at 17:24
  • I guess the major question is: what will drive the requirement that configures the maximum numbers and range? Will it be intelligent enough to ensure that the strings to the possible numbers will be pretty sparse? E.g., if you say "maximum sequence should be 1", and try using 11 strings, it's impossible. If these maximums are determined for us, it's done so by something that has an idea of how it expects the function to work. – Scott Mermelstein Feb 21 '13 at 17:35
  • My question is: would a trivial function that converts every string to the same number sequence (say, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) satisfy all the requirements? I don't see anything in your list of requirements that would exclude this. – Ted Hopp Feb 21 '13 at 17:44
  • @TedHopp I'm looking for something that has output that looks a bit more magical, such as a sequence of numbers for a lottery ticket. – The Muffin Man Feb 21 '13 at 17:48
  • If generating numbers for a lottery ticket is the use case, you should say so! Otherwise we're just stumbling around in the dark, finding fault with your requirements or offering suggestions that fail to meet unstated criteria. – bmm6o Feb 22 '13 at 17:28

4 Answers4

1

If you want to limit the size of the output series - then this is impossible.

Proof:
Assume your output is a series of size k, each of range r <= M for some predefined M, then there are at most k*M possible outputs.

However, there are infinite number of inputs, and specifically there are k*M+1 different inputs.

From pigeonhole principle (where the inputs are the pigeons and the outputs are the pigeonholes) - there are 2 pigeons (inputs) in one pigeonhole (output) - so the requirement cannot be achieved.


Original answer, provides workaround without limiting the size of the output series:

You can use prime numbers, let p1,p2,... be the series of prime numbers.
Then, convert the string into series of numbers using number[i] = ascii(char[i]) * p_i
The range of each character is obviously then [0,255 * p_i]

Since for each i,j such that i != j -> p_i * x != p_j * y (for each x,y) - you get uniqueness. However, this is mainly nice theoretically as the generated numbers might grow quickly, and for practical implementation you are going to need some big number library such as java's BigInteger (cannot recall the C# equivalent)

Another possible solution (with the same relaxation of no series limitation) is:

number[i] = ascii(char[i]) + 256*(i-1)

In here the range for number[i] is [256*(i-1),256*i), and elements are still distinct.

amit
  • 175,853
  • 27
  • 231
  • 333
  • What happens when your string spans more characters than the series allows? – NominSim Feb 21 '13 at 17:17
  • @NominSim Well, since there are infinite number of prime numbers, the series is infinite - so the string cannot span longer then it. However, for extremely large strings, finding what the next prime number is might be tricky (but it does not seem to be an issue here) – amit Feb 21 '13 at 17:18
  • I am referring to the OPs tentative "solution" where he says 'Remove everyother number until we satisfy total numbers (10 in this case)' Which seemed to indicate that he wants to be able to limit the output series without necessarily limiting the input. – NominSim Feb 21 '13 at 17:20
  • @NominSim Correct, so if the max series of numbers is 10 and you put in a paragraph we somehow need to trim down the input or implement some way to come up with the 10 numbers by evaluating every number in the input. – The Muffin Man Feb 21 '13 at 17:22
  • @Nick In this case, it is impossible - let me edit the answer and explain why. – amit Feb 21 '13 at 17:23
  • @Nick That is what I thought you meant, unfortunately...that means that you are going to have duplicate outputs for different inputs unless your output "numbers" can actually be infinite (naively) sequences of digits. – NominSim Feb 21 '13 at 17:25
  • @Nick Look at the edit, I added a **proof** this cannot be done - if you limit your output - you must limit your input as well (to at most the same range of possibilities), otherwise - it cannot be achieved. – amit Feb 21 '13 at 17:27
  • It can be achieved like so: If sequence is 10 numbers input must be at least 10 chars. If input is longer than output we remove input. – The Muffin Man Feb 21 '13 at 17:31
  • @Nick That is incorrect, because then you're outputs will not be distinct. – NominSim Feb 21 '13 at 17:32
  • @NominSim Then we iterate over the output to be distinct? – The Muffin Man Feb 21 '13 at 17:41
1

Mathematically, it is theoretically possible to do what you want, but you won't be able to do it in C#:

If your outputs are required to be distinct, then you cannot lose any information after encoding the string using ASCII values. This means that if you limit your output size to n numbers then the numbers will have to include all information from the encoding.

So for your example

"Hello World" -> 104 101 108 108 111 32 119 111 114 108 100

you would have to preserve the meaning of each of those numbers. The simplest way to do this would just 0 pad your numbers to three digits and concatenate them together into one large number...making your result 104101108111032119111114108100 for max numbers = 1. (You can see where the issue becomes, for arbitrary length input you need very large numbers.) So certainly it is possible to encode any arbitrary length string input to n numbers, but the numbers will become exceedingly large.

If by "numbers" you meant digits, then no you cannot have distinct outputs, as @amit explained in his example with the pidgeonhole principle.

NominSim
  • 8,447
  • 3
  • 28
  • 38
  • Note that this makes other relaxation of neglecting "Need ability to configure max range for each number in sequence.". However, as we already established - it cannot be done with at least one relaxation. – amit Feb 21 '13 at 17:40
  • 1
    Unless we can use real numbers, and then just translate the number to a real number (and add a clear distinction for end). In this case, since we have infinite real numbers in range [0,1], we can use this range to represent all strings. – amit Feb 21 '13 at 17:45
  • @amit Thanks, you're right...I like your workaround too...0 with an infinite sequence of numbers in the mantissa is of course still in the range [0,1] :) – NominSim Feb 21 '13 at 17:48
0

Let's eliminate your criteria as easily as possible. For distinct, deterministic, just use a hash code. (Hash actually isn't guaranteed to be distinct, but is highly likely to be):

string s = "hello world";
uint hash = Convert.ToUInt32(s.GetHashCode());

Note that I converted the signed int returned from GetHashCode to unsigned, to avoid the chance of having a '-' appear.

Then, for your max range per number, just convert the base.

That leaves you with the maximum sequence criteria. Without understanding your requirements better, all I can propose is truncate if necessary:

hash.toString().Substring(0, size)

Truncating leaves a chance that you'll no longer be distinct, but that must be built in as acceptable to your requirements? As amit explains in another answer, you can't have infinite input and non-infinite output.

Community
  • 1
  • 1
Scott Mermelstein
  • 15,174
  • 4
  • 48
  • 76
  • Just to point out "highly likely to be" distinct is fairly relative, as there are only 2^32 possibilities for the hash code. – NominSim Feb 21 '13 at 17:54
0

Ok, so in one comment you've said that this is just to pick lottery numbers. In that case, you could do something like this:

public static List<int> GenNumbers(String input, int count, int maxNum)
{
    List<int> ret = new List<int>();
    Random r = new Random(input.GetHashCode());
    for (int i = 0; i < count; ++i)
    {
        int next = r.Next(maxNum - i);
        foreach (int picked in ret.OrderBy(x => x))
        {
            if (picked <= next)
                ++next;
            else
                break;
        }
        ret.Add(next);
    }
    return ret;
}

The idea is to seed a random number generator with the hash code of the String. The rest of that is just picking numbers without replacement. I'm sure it could be written more efficiently - an alternative is to generate all maxNum numbers and shuffle the first count. Warning, untested.

I know newer versions of the .Net runtime use a random String hash code algorithm (so results will differ between runs), but I believe this is opt-in. Writing your own hash algorithm is an option.

bmm6o
  • 6,187
  • 3
  • 28
  • 55