4

I need to check if a string is fairly random without performing frequency analysis because it will be too time consuming. Is there such algorithm already out there? I am building this with java but a generic description of an algorithm will be also very useful.

Clarification: To the human eye, the following text is somehow random.... dsfsddsfdsfsddsfs .... or even po340-3gk30g3gkf;glkp.

I don't want to know for sure how random it is. I just want to detect, pretty much the way a human being will, if a string is random looking without measuring it's actual randomness.

Martlark
  • 14,208
  • 13
  • 83
  • 99
Pass
  • 1,501
  • 4
  • 21
  • 39
  • Maybe you are able to adopt some of the Diehard tests to your string sequences: http://en.wikipedia.org/wiki/Diehard_tests – fjdumont Feb 21 '11 at 10:12
  • 2
    "..without performing frequency analysis because it will be **too time consuming."** Does the term 'premature optimization' mean anything to you? – Andrew Thompson Feb 21 '11 at 10:41
  • 1
    @fjdumont I don't think those will work for a single value. – shmosel May 31 '16 at 06:15

4 Answers4

8

I need to check if a string is fairly random without performing frequency analysis because it will be too time consuming.

A simple frequence analysis is basically the fastest thing I can imagine. You just traverse the characters in the string (once) and keep track of the counts.

I can't imagine you can find any "randomness-test" that is faster than this.

Besides, I can't really say that your question is clear. Technically any string is as random as any other. If you're after what "looks" random, I suppose you need to look for all kinds of patterns, and this will for sure be too time-consuming for you.

Is this random in your opinion:

String str = "                      o         _        _            _        "
           + "           _o        /\_      _ \\o     (_)\__/o     (_)       "
           + "         _< \_      _>(_)    (_)/<_       \_| \      _|/' \/   "
           + "        (_)>(_)    (_)           (_)      (_)       (_)'  _\o_ ";

It doesn't look very random to me, but I'd have a hard time to define what looks random.

aioobe
  • 413,195
  • 112
  • 811
  • 826
  • 3
    +1 for the nice ASCII art. "I can't imagine you can find any "randomness-test" that is faster than this." Well, you can, and you do in the next paragraph: "Technically any string is as random as any other.". So, `randomness-test=True` is a nice faster example. – ypercubeᵀᴹ Feb 21 '11 at 12:55
5

Measure the length of the string after compressing it. gzip will do.

All compressors work by looking for redundancy in the input. Repetition of substrings is a form of redundancy that corresponds to common intuitive, and mathematical, understandings of non-randomness. gzip and its ilk specifically look for repeated substrings and replace the 2nd and subsequent occurrences with shorter "pointers" back to the original.

The length of the compressed string gives you an upper bound on its Kolmogorov complexity which is in a sense its "absolute randomness", but which can't be measured directly.

Although gzip and other general-purpose compressors will generally produce a header, so that short strings might appear to actually grow in length (i.e. it's not usually the case that length(a short string) < length(compress(a short string))), it's still true in general that length(compress(a short repetitive string)) < length(compress(a short non-repetitive string)), which is hopefully all you need.

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
  • Good suggestion, but I can assure you that this is far more time consuming that doing a frequency check. – aioobe Feb 21 '11 at 13:05
  • @aioobe: Definitely more time consuming, but a frequency check won't detect as much non-randomness, since it only looks at individual letters rather than substrings. It says that the following 2 strings are equally "random": `abcabcabc`, `abbacbcac`. – j_random_hacker Feb 21 '11 at 23:45
  • Pretty clever, actually. – Josef Sábl Mar 24 '14 at 10:35
0

You can either analyze the algorithm generating the Strings somehow or do the frequency analysis. But I believe there is no way to determine if a String is fairly random.

Is '13530168=dwninwebvp' fairly random?

Jan Zyka
  • 17,460
  • 16
  • 70
  • 118
0

You can detect random strings using common bigrams and less-common bigrams.

Here is Java code:

package checker;

import java.util.HashMap;
import java.util.Map;

public class RandomStringChecker {
    private static final double DEFAULT_THRESHOLD = 0.1;
    private static final Map<String, Double> enBigramsDict = new HashMap<>();

    static {
        // Populate the dictionary of English bigrams here
        enBigramsDict.put("ab", 6.461565670356169);
        enBigramsDict.put("bc", 0.0531330714265234);
        enBigramsDict.put("cd", 0.06273467822837461);
        // ...
    }

    public static boolean isRandomString(String word) {
        return isRandomString(word, DEFAULT_THRESHOLD);
    }

    public static boolean isRandomString(String word, double threshold) {
        // Allow only words longer than 3 characters which contain only English alphabetic characters
        if (word.length() < 4 || !word.matches("[a-zA-Z]+")) {
            return false;
        }

        // Repeating characters
        if (word.chars().distinct().count() == 1) {
            return true;
        }

        // Turn word into lowercase
        word = word.toLowerCase();

        // Get list of bigrams from the word
        String[] bigrams = new String[word.length() - 1];
        for (int i = 0; i < word.length() - 1; i++) {
            bigrams[i] = word.substring(i, i + 2);
        }

        // Get number of common and uncommon bigrams
        int numCommonBigrams = 0;
        for (String bigram : bigrams) {
            if (enBigramsDict.containsKey(bigram) && enBigramsDict.get(bigram) > threshold) {
                numCommonBigrams++;
            }
        }
        int numUncommonBigrams = bigrams.length - numCommonBigrams;

        // Higher number wins
        return numCommonBigrams <= numUncommonBigrams;
    }

    public static void main(String[] args) {
        System.out.println(isRandomString("abcd")); // true
    }
}


You can find enBigramsDict entries here and more details about implementation at Medium blog post here.

nightzus
  • 11
  • 2
  • the question is tagged java :) – kleopatra May 10 '23 at 11:31
  • but you can apply the same logic in Java – nightzus May 11 '23 at 12:16
  • sure - but only if you know python :) Please translate to java to make it an answer to the question. BTW: don't post duplicate answers to multiple questions (this looks like the same as https://stackoverflow.com/questions/6297991/is-there-any-way-to-detect-strings-like-putjbtghguhjjjanika/76216044#76216044) – kleopatra May 11 '23 at 12:42
  • Thank you for your suggestions, will translate to Java. Also, I'd say that the same algorithm can be applied on both questions (you can see the examples are different). – nightzus May 11 '23 at 14:27