How do I compute the approximate entropy of a bit string?

Question

Is there a standard way to do this?

Googling -- "approximate entropy" bits -- uncovers multiple academic papers but I'd like to just find a chunk of pseudocode defining the approximate entropy for a given bit string of arbitrary length.

(In case this is easier said than done and it depends on the application, my application involves 16,320 bits of encrypted data (cyphertext). But encrypted as a puzzle and not meant to be impossible to crack. I thought I'd first check the entropy but couldn't easily find a good definition of such. So it seemed like a question that ought to be on StackOverflow! Ideas for where to begin with de-cyphering 16k random-seeming bits are also welcome...)

See also this related question:
What is the computer science definition of entropy?

score 39 · Answer 1 · answered Jun 08 '10 at 14:10

39

Entropy is not a property of the string you got, but of the strings you could have obtained instead. In other words, it qualifies the process by which the string was generated.

In the simple case, you get one string among a set of N possible strings, where each string has the same probability of being chosen than every other, i.e. 1/N. In the situation, the string is said to have an entropy of N. The entropy is often expressed in bits, which is a logarithmic scale: an entropy of "n bits" is an entropy equal to 2ⁿ.

For instance: I like to generate my passwords as two lowercase letters, then two digits, then two lowercase letters, and finally two digits (e.g. va85mw24). Letters and digits are chosen randomly, uniformly, and independently of each other. This process may produce 26*26*10*10*26*26*10*10 = 4569760000 distinct passwords, and all these passwords have equal chances to be selected. The entropy of such a password is then 4569760000, which means about 32.1 bits.

answered Jun 08 '10 at 14:10

Thomas Pornin

72,986
14
147
189

This is correct but I may not have asked the question correctly. See the answer I gave which perhaps indicates the question I meant to ask. But I think it may in fact be standard to refer to the "approximate entropy" of a bitstring. Regardless, this answer is useful and relevant; thanks! – dreeves Jun 08 '10 at 18:18
4

@specializt The answer has constraints for the characters, so the available alphabet is not 36 characters for every character in the password. For unconstrained 8-character passwords from an alphabet of 36 characters, your calculation is correct; but with the explanation in the answer, the added constraints actually make it more interesting, and somewhat more illustrative. – tripleee Aug 27 '15 at 09:34
@tripleee the constraint in this answer is exactly "36" -- a-z and 0-9. Also : you are contradicting yourself - at first you acknowledge that i calculated that very constraint and immediately after that you claim its "unconstrained". Are you confused, perhaps? – specializt Aug 27 '15 at 09:56
4

The constraints are that the first two are lowercase alphabetics (alphabet is 26 characters), the next two are numbers (alphabet is 10 characters), etc. I don't see how I can make this any clearer than it already is. – tripleee Aug 27 '15 at 10:04
@tripleee wrong, you should read the answer again. Heres a clue : `Letters and digits are chosen randomly, uniformly, and independently of each other.` -- So any combination is valid, even strings like "00000000" and whatnot. – specializt Aug 27 '15 at 10:05
4

Sheesh. *Under those constraints.* 00000000 violates the constraints. Within the group of first two alphabetics, they are chosen randomly, uniformly, and independently. Then two digits are pulled out of the digit pool, randomly, uniformly, and independently. – tripleee Aug 27 '15 at 10:13
For anyone reading this post, please note that GPUs are now fast enough to do offline brute force attacks on 8 character alphanumeric string hashes (the entire space) in a few hours. 12-16 character passwords are much better. – Halcyon Dec 05 '18 at 15:04

score 36 · Answer 2 · edited Jun 11 '21 at 13:24

36

Shannon's entropy equation is the standard method of calculation. Here is a simple implementation in Python, shamelessly copied from the Revelation codebase, and thus GPL licensed:

import math


def entropy(string):
    "Calculates the Shannon entropy of a string"

    # get probability of chars in string
    prob = [ float(string.count(c)) / len(string) for c in dict.fromkeys(list(string)) ]

    # calculate the entropy
    entropy = - sum([ p * math.log(p) / math.log(2.0) for p in prob ])

    return entropy


def entropy_ideal(length):
    "Calculates the ideal Shannon entropy of a string with given length"

    prob = 1.0 / length

    return -1.0 * length * prob * math.log(prob) / math.log(2.0)

Note that this implementation assumes that your input bit-stream is best represented as bytes. This may or may not be the case for your problem domain. What you really want is your bitstream converted into a string of numbers. Just how you decide on what those numbers are is domain specific. If your numbers really are just one and zeros, then convert your bitstream into an array of ones and zeros. The conversion method you choose will affect the results you get, however.

edited Jun 11 '21 at 13:24

splch

629
8
17

answered Jun 05 '10 at 04:50

fmark

57,259
27
100
107

Ah, thank you! But this requires you to know the word length in your bit string? For example, I could apply this to my string of 16,320 bits if I assume that those are really 2040 bytes. – dreeves Jun 05 '10 at 04:56
Edited answer to provide info about this – fmark Jun 05 '10 at 05:08
If you convert to just ones and zeros then won't this algorithm treat "0101010101..." as having the maximum possible entropy? – dreeves Jun 05 '10 at 05:25
No. It should return `1`, whereas the maximum entropy grows according to the string length, as per the `entropy_ideal()` function. For a 1 bit string `1` the entropy is `0` and the max entropy is `0`. For a two bit string `10` the entropy is `1` and the max entropy is `1`. For three bit `101` the entropy is remains `1` but the max entropy increases to `~1.5`. As more bits are added, the entropy should remain constant but the maximum possible entropy should increase. – fmark Jun 05 '10 at 06:01
But the code you've given seems not to depend on the order of the bits so a string with 0 and 1 in strict alternation should give the same result as any other string with the same number of 0's and 1's. – dreeves Jun 05 '10 at 07:05
That is true - is this a problem? If so, using well-known compression algorithms is the easiest proxy for complexity. – fmark Jun 05 '10 at 09:38
1

As per cypherpunks answer, this assumes a model where every character is equally likely at every position. – President James K. Polk Jun 05 '10 at 12:34
2

@fmark @dreeves The information entropy depends on the number of states available. As binary strings only have 2 possible states the maximum entropy is always 1. – Chris de Vries Nov 10 '10 at 11:32
@cdv: Yes, but **2 states per character**! You claim a random 512 bit string has the same entropy as a single bit ``0``/``1``. This is just wrong. Entropy characterizes what we don't know about a system, or how hard it is to guess the correct bit pattern, and additive in string length (because it is ``log(number of possible values)``). To correct, multiply the return value by ``len(string)``. – Christian Aichinger Apr 29 '15 at 03:37
This is incorrect. It needs to consider the total number of characters being used. For example, "01010101" has an entropy of 1.0 and a maximum/ideal entropy of 1.0 because it's binary. This formula will always require as many characters as there are slots in order to reach maximum entropy. Easily, it could accept a parameter for the total number of available characters for the given string. – Xodarap777 Feb 14 '19 at 08:29
To check my assumption, I tried running it on the first 1000 digits of pi, which feature a statistically even and random distribution of digits 0-9. The entropy calculated successfully ~3.3, but the entropy_ideal assumed there were 1000 possible characters. The solution I used was to use `entropy_ideal(string)` and replace both instances of `length` with `len(set(string))` - assuming that all possible characters are used. It could easily be a manual input, too. – Xodarap777 Feb 14 '19 at 09:09

score 19 · Accepted Answer · answered Jun 05 '10 at 04:48

19

I believe the answer is the Kolmogorov Complexity of the string. Not only is this not answerable with a chunk of pseudocode, Kolmogorov complexity is not a computable function!

One thing you can do in practice is compress the bit string with the best available data compression algorithm. The more it compresses the lower the entropy.

answered Jun 05 '10 at 04:48

dreeves

26,430
45
154
229

1

A small correction, low compression indicates low entropy since low entropy equal low disorder. [Entropy, Compression, and Information Content](http://www.isi.edu/~vfossum/entropy.pdf) – lsalamon Mar 15 '13 at 18:43
"Drawing on these intuitions, Shannon developed a measure of entropy for language that assigns high entropy to the disordered, random first sentence, and low entropy to the ordered, patterned second sentence"... from your cited paper @isalamon – VP. Jul 05 '13 at 09:59
@lsalamon, link is broken. – Valmiky Arquissandas Dec 09 '16 at 16:46
@ValmikyArquissandas, here another paper about [Entropy](http://prac.im.pwr.edu.pl/~downar/english/documents/entropy.pdf) – lsalamon Dec 13 '16 at 11:55
3

@lsalamon High compression => low entropy. Low compression => high entropy. – mechanicious Mar 03 '18 at 21:04

score 9 · Answer 4 · answered Nov 04 '13 at 19:16

The NIST Random Number Generator evaluation toolkit has a way of calculating "Approximate Entropy." Here's the short description:

Approximate Entropy Test Description: The focus of this test is the frequency of each and every overlapping m-bit pattern. The purpose of the test is to compare the frequency of overlapping blocks of two consecutive/adjacent lengths (m and m+1) against the expected result for a random sequence.

And a more thorough explanation is available from the PDF on this page:

http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html

This might be a bit late, but I found this pretty new code snippet on github for the NIST implementation of ApEn: https://gist.github.com/StuartGordonReid/ff86c5a895fa90b0880e — Chaoste, Feb 28 '19 at 13:42

score 8 · Answer 5 · answered Jun 05 '10 at 06:49

There is no single answer. Entropy is always relative to some model. When someone talks about a password having limited entropy, they mean "relative to the ability of an intelligent attacker to predict", and it's always an upper bound.

Your problem is, you're trying to measure entropy in order to help you find a model, and that's impossible; what an entropy measurement can tell you is how good a model is.

Having said that, there are some fairly generic models that you can try; they're called compression algorithms. If gzip can compress your data well, you have found at least one model that can predict it well. And gzip is, for example, mostly insensitive to simple substitution. It can handle "wkh" frequently in the text as easily as it can handle "the".

I'm not sure I understand your second paragraph. – dreeves Jun 05 '10 at 06:58 — dreeves, Jun 05 '10 at 06:58

Thomas Dussaut · Answer 6 · 2019-02-14T13:57:22.687

3

Using Shannon entropy of a word with this formula : https://i.stack.imgur.com/GBBJG.jpg

Here's a O(n) algorithm that calculates it :

import math
from collections import Counter


def entropy(s):
    l = float(len(s))
    return -sum(map(lambda a: (a/l)*math.log2(a/l), Counter(s).values()))

edited Feb 14 '19 at 13:57

answered May 30 '17 at 13:13

Thomas Dussaut

728
1
7
20

Ulf Aslak · Answer 7 · 2016-10-07T09:09:49.063

1

Here's an implementation in Python (I also added it to the Wiki page):

import numpy as np

def ApEn(U, m, r):

    def _maxdist(x_i, x_j):
        return max([abs(ua - va) for ua, va in zip(x_i, x_j)])

    def _phi(m):
        x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)]
        C = [len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x]
        return -(N - m + 1.0)**(-1) * sum(np.log(C))

    N = len(U)

    return _phi(m) - _phi(m + 1)

Example:

>>> U = np.array([85, 80, 89] * 17)
>>> ApEn(U, 2, 3)
-1.0996541105257052e-05

The above example is consistent with the example given on Wikipedia.

edited Oct 07 '16 at 09:09

answered Oct 07 '16 at 08:57

Ulf Aslak

7,876
4
34
56

2

what are m and r? – Shannon Jan 29 '19 at 21:21
I also have another question. Using this function, I'll get 0 for randU = np.random.choice([0, 1], size=17 * 3), m = 2, r = 3. Is this normal? – Shannon Jan 31 '19 at 04:10
@Shabnam I honestly can't remember, I haven't used this for a while. But I tested my implementation pretty thoroughly when I wrote it. If you check out the wikipedia article I'm sure you can understand. – Ulf Aslak Feb 01 '19 at 10:15

How do I compute the approximate entropy of a bit string?

7 Answers7

Linked

Related