-3

I have implemented a function Generating numbers in [0,x] by given string :


import org.apache.commons.codec.digest.DigestUtils;

public class BaseUtil {

    public static int getIntValue(String stringText, String x) {
        var modNum = new BigInteger(x);
        String sha256hex = DigestUtils.sha256Hex(stringText);
        var result = new BigInteger(sha256hex, 16);
        int intValue = result.mod(modNum).intValue();
        return intValue;
    }
}

Does the return intValue obey uniform distribution in [0,x] for much random string?

DachuanZhao
  • 1,181
  • 3
  • 15
  • 34
  • 1
    May be a duplicate of [others](https://duckduckgo.com/?q=site%3Astackoverflow.com+generate+numbers+in+general+distribution&t=osx&ia=web) as well. Always search Stack Overflow before posting. – Basil Bourque Sep 23 '21 at 03:35
  • I'm sure it's not a duplicate question . Please remove the tag . – DachuanZhao Sep 23 '21 at 03:54
  • 1
    Please explain what you mean by "numbers in [1,10] by given string" and "numbers in [a,b] by given string". I see your code, and understand what your code actually does ... but I cannot figure out how that your characterization of what it does. (The result of calling your method is not a number in the range `[1,10]`, it doesn't contain 1 to 10 decimal digits if you print it in decimal, it doesn't in any way constrain the digits you can display the number in. I am at a total loss as to what you mean.) – Stephen C Sep 23 '21 at 04:21
  • List numbers = Stream.iterate(Integer.parseInt(a), n -> n + 1) .limit(Integer.parseInt(b)) .collect(Collectors.toList()); – sunleo Sep 23 '21 at 04:24
  • List range = IntStream.rangeClosed(typeCased a, typecasted b) .boxed().collect(Collectors.toList()); – sunleo Sep 23 '21 at 04:26
  • I have edited my question , thanks all . – DachuanZhao Sep 23 '21 at 06:12
  • Why should there be a normal distribution involved here? Not to mention that the normal distribution is defined for real numbers, not integers. – Olivier Sep 23 '21 at 06:51
  • Uniform distribution ... It's a mistake ... – DachuanZhao Sep 23 '21 at 07:00
  • 1
    Well, it should be more or less uniform, yes. – Olivier Sep 23 '21 at 07:12
  • There are significantly more efficient ways to generate uniformly distributed numbers from strings. For a start, you don't need to use `BigInteger` or turn the digest into a hex string. – Stephen C Sep 23 '21 at 07:56
  • @StephenC Could you add an answer to show that ? thanks ~ – DachuanZhao Sep 23 '21 at 09:03

1 Answers1

1

Does the method Generating int number in [0,x] by given string follow uniform distribution?

Almost. It is difficult to entirely eliminate non-uniformity ... unless 2256 happens to be evenly divisible by x.

Note that you are actually generating a number in [0,x) ... since the result cannot be x.


You also asked about more efficient implementations than the one in your question.

public class BaseUtil {

    public static int getIntValueV1(String stringText, int x) {
        if (x <= 0) {
            throw new InvalidArgumentException("x must be strictly positive");
        }
        MessageDigest digest = MessageDigest.getInstance("SHA-256");
        byte[] hash = digest.digest(
            stringText.getBytes(StandardCharsets.UTF_8));
        return new BigInteger(hash).mod(new BigInteger(x)).intValue()
    }

    public static int getIntValueV2(String stringText, int x) {
        if (x <= 0) {
            throw new InvalidArgumentException("x must be strictly positive");
        }
        MessageDigest digest = MessageDigest.getInstance("SHA-256");
        byte[] hash = digest.digest(
            stringText.getBytes(StandardCharsets.UTF_8));
        ByteBuffer buff = new ByteBuffer(hash);
        long number = 0;
        for (int i = 0; i < 8; i++) {
            number = number ^ buff.getLong();
        }
        return (int)(number % x);
    }
}

Preconditions:

  • Since the result of your method is an int, that implies that x must also be an int.

  • Since you want numbers in the range [0,x), that implies that x must be greater than zero.

Implementations:

I am using the standard MessageDigest class since it has been in Java SE since Java 5 (if not earlier).

The first version uses BigInteger to minimize the non-uniformity when we reduce the bytes into a number in the range [0, x)

The second version uses long arithmetic to compute the remainder. I think that means that the distribution might be a bit more non-uniform than the first version. (My math skills are too rusty ...) It would also be possible to eliminate the use of ByteBuffer to convert the bytes to a sequence of longs.

I have not benchmarked these versions to see which is faster. But both should be faster than producing an intermediate hexadecimal string and parsing it.

Note that you could probably get away with using a less expensive hashing algorithm, depending on the actual use-case for this generator.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216