How can I estimate the entropy content of this input?

Question

I have a 1KHZ triangle wave generator that I am measuring from a PIC micro controller using the analog input. The frequency source for the triangle wave and the analog capture are separate frequency sources. The ADC captures at 100ksps with 12 [edit:10] usable bits of precision.

I want to estimate the entropy contained in the analog samples for the purpose of generating true random numbers. The two sources of entropy that I have identified are the kelvin noise, and the frequency source offsets.

From the captured waveform I can continuously distinguish about two frequencies per second and I will capture on average one kelvin input threshold upset event per second. So my estimate is about two bits of entropy per second.

Can anyone think of a way to justify a larger entropy estimate?

Based on answers to similar questions already posted on S.O., I'll add the following clarifications:

I am not particularly interested in other ideas for entropy sources, as I would still have to answer this same question for those alternate sources.

Analysis of the data itself for autocorrelation or other measures of randomness is not the correct answer as they will be wildly optimistic.

sorry, not help, but i am curious about the terminology "kelvin input threshold upset event". i did google, but am not finding much. do you have a reference for what that is? thanks. — andrew cooke, Mar 12 '12 at 23:36
I don't know but generating true random numbers? Ahh, you are joker. And you will never get 12 usable bits of precision with PIC ADC, check datasheet. — GJ., Mar 13 '12 at 00:09
To answer this question requires a great deal of knowledge about the physics of your source, and is thus off-topic. — President James K. Polk, Mar 13 '12 at 00:15
Thanks for the attention. Here are my responses. #1) I made up the term. I mean the number of times I would expect two units to make different readings only due to the voltage caused by thermal noise. #2) GJ correct. The ADC is 10 bit. #3) This could indeed be the case. I am hoping otherwise. Note that the two sources of entropy that I have identified so far make no assumptions whatsoever about the characteristics of the source. Making a serious attempt to estimate entropy is something a software engineer must do, regardless of the quality of the source. — user1265195, Mar 13 '12 at 01:51
"Analysis of the data itself for autocorrelation or other measures of randomness is not the correct answer as they will be wildly optimistic." It's still a good thing to check. I'm not sure why you think it will be "optimistic". http://www.fourmilab.ch/random/ — endolith, Apr 24 '12 at 14:35
@endolith - It is optimistic because auto-correlation does not measure every type of structure in data. For example, auto-correlation would say that a cryptographic random number generator is very random, when it in fact contains zero entropy. — user1265195, Nov 30 '12 at 01:02

score 1 · Answer 1 · answered Mar 13 '12 at 17:54

I made some progress that may help others.

Primary resource http://en.wikipedia.org/wiki/Johnson%E2%80%93Nyquist_noise

The pin capacitance will limit the amount of measurable thermal noise to 20uV within the bandwidth of the ADC. This should be more or less the same for a pretty wide variety of controllers. Use a ~10K resistor between the signal and the pin. Smaller values will reduce the noise, but increase the possible sample rate.

The signal does not need to be random. It just needs to be evenly distributed within the range of at least a few discrete input steps. Note that outputting to a GPIO on the same clock domain as the input might not meet this requirement.

With a 10b ADC with a dynamic range of 3.3V, each discrete step is 3mV. Entropy per sample is around 20uV / 3mV = 0.006 bits per sample.

Also note that this doesn't require an analog input. You could do it with a digital input, but the bin size would be much larger (1V?), and the answer would be more like 0.000018 bits per sample. So taking an input sample every millisecond, generating a 64-bit random seed would take about an hour.

Peter O. · Accepted Answer · 2019-02-25T12:42:58.600

0

NIST Publication SP800-90B recommends min-entropy as the entropy measure. However, testing the min-entropy of an entropy source is non-trivial. See NIST SP800-90B for one way such testing can be done.

edited Feb 25 '19 at 12:42

answered Oct 20 '12 at 07:26

Peter O.

32,158
14
82
96

score -1 · Answer 3 · answered Mar 14 '12 at 00:39

Your question is off-topic if we are talking about "physics" entropy. But we could just as easily sample your analog signal, turn it into a digital waveform, and then discuss entryopy in the information-theoretical context.

An easy and surprisingly accurate method for measuring entropy in a digital signal is to attempt to compress it using the best methods available. The higher the compression ratio, the smaller the information content.

If your goal is to generate random bits to produce a seed (as alluded to in one of the other answers), a useful approach is to compress randomness sampled from the environment (keyboard strokes, mouse movements, your analog system) using a common compression algorithm, and then discard the dictionary. What remains, will have significant information content.

The bit about the thermal noise is off topic, I agree. The part about converting the amount of uncertainty in measurement to a measure of bits-of-entropy is on-topic, because system implementers must do this from whatever sources they have available (mouse movements, as you suggested). The reason compression is inappropriate for this estimation is that no compression algorithm can exploit all possible sources of redundancy. — user1265195, Apr 02 '12 at 22:22

How can I estimate the entropy content of this input?

3 Answers3