A more efficient Benford's Law code?

Question

An academic question. This function computes Benford's Law for integers up to maxvalue, and prints a summary table. I've tried a nested for-loop method, a dict method, and this collections method. The latter (code below) seems the fastest (timeit result: 1.4852424694 sec), but is there a faster and memory-efficient method for cycling through so many possibilities?

from __future__ import print_function
def BenfordsLaw4(maxvalue = 10**6):
    from collections import Counter
    sqList = (str((i+1)**2)[0] for i in range(maxvalue))
    BenfordList = Counter(sqList)

    print("Benford's Law for numbers between 1 and", maxvalue, "\nDigits,\t\t\t", "Count,\t\t\t", "Percentage")
    for i,j in sorted(BenfordList.iteritems()):
        print(',\t\t\t\t'.join([str(i), str(j), str(j*100./maxvalue)+' %']))

In Python 2.7, `xrange` might be a bit faster than `range` and will definitely be more memory efficient. — Gabe, May 28 '13 at 17:06
What's the `**2` doing in there? That's probably going to take most of the time and causes you to measure the powers of 2 up to 2^1,000,000 not integers up to 1,000,000. — Gabe, May 28 '13 at 17:10
A counter is also not the most efficient way to do what you want. I had an old question about this and you should be able to get a speed increase of about an order of magnitude using the accepted answer to this question: http://stackoverflow.com/questions/16715242/most-efficient-histogram-code-in-python — Slater Victoroff, May 28 '13 at 17:25
Thanks all, FWIW, the nested-loops and dict.setdefault methods gave me timeit times of 5.17980374004 and 2.2712666522 sec, respectively. The **2 looks at the squares of a sequence. I often deal with large datasets and wanted a simple series with a large data-signature as an example. Will try Gabe and Slater's answers out next. The xrange method shaves only 0.02 seconds from the answers above. — ksed, May 28 '13 at 17:57

Gabe · Accepted Answer · 2013-05-28T18:36:11.240

1

Changing the main loop to this:

def BenfordsLaw4(maxvalue = 10**6):
    BenfordList = {str(i+1):0 for i in range(9)}
    for i in (str((i+1)**2)[0] for i in xrange(maxvalue)):
        BenfordList[i] += 1

takes the time from about 1.55s to about 1.25; however taking out the **2 takes the time down to about 0.32s.

In other words, the vast majority of your time is spent squaring your operands.

Curiously, I was able to shave about 0.05s by using "%s" % ((i+1)**2) instead of str((i+1)**2).

edited May 28 '13 at 18:36

answered May 28 '13 at 18:29

Gabe

84,912
12
139
238

Sweet! I figured the bulk of the time is spent generating the numbers (i.e. the function or data-read), but I really like that your answer is faster yet (1.2564006048 sec) and avoids importing a special method. – ksed May 28 '13 at 18:59
Could just change the range to be `range(1, 10)` and avoid the `+1` ops – Jon Clements May 28 '13 at 19:00

A more efficient Benford's Law code?

1 Answers1