3

An academic question. This function computes Benford's Law for integers up to maxvalue, and prints a summary table. I've tried a nested for-loop method, a dict method, and this collections method. The latter (code below) seems the fastest (timeit result: 1.4852424694 sec), but is there a faster and memory-efficient method for cycling through so many possibilities?

from __future__ import print_function
def BenfordsLaw4(maxvalue = 10**6):
    from collections import Counter
    sqList = (str((i+1)**2)[0] for i in range(maxvalue))
    BenfordList = Counter(sqList)

    print("Benford's Law for numbers between 1 and", maxvalue, "\nDigits,\t\t\t", "Count,\t\t\t", "Percentage")
    for i,j in sorted(BenfordList.iteritems()):
        print(',\t\t\t\t'.join([str(i), str(j), str(j*100./maxvalue)+' %']))
mtrw
  • 34,200
  • 7
  • 63
  • 71
ksed
  • 335
  • 1
  • 4
  • 13
  • What were the times for the other methods? – mtrw May 28 '13 at 17:03
  • 1
    In Python 2.7, `xrange` might be a bit faster than `range` and will definitely be more memory efficient. – Gabe May 28 '13 at 17:06
  • What's the `**2` doing in there? That's probably going to take most of the time and causes you to measure the powers of 2 up to 2^1,000,000 not integers up to 1,000,000. – Gabe May 28 '13 at 17:10
  • 1
    A counter is also not the most efficient way to do what you want. I had an old question about this and you should be able to get a speed increase of about an order of magnitude using the accepted answer to this question: http://stackoverflow.com/questions/16715242/most-efficient-histogram-code-in-python – Slater Victoroff May 28 '13 at 17:25
  • Thanks all, FWIW, the nested-loops and dict.setdefault methods gave me timeit times of 5.17980374004 and 2.2712666522 sec, respectively. The **2 looks at the squares of a sequence. I often deal with large datasets and wanted a simple series with a large data-signature as an example. Will try Gabe and Slater's answers out next. The xrange method shaves only 0.02 seconds from the answers above. – ksed May 28 '13 at 17:57

1 Answers1

1

Changing the main loop to this:

def BenfordsLaw4(maxvalue = 10**6):
    BenfordList = {str(i+1):0 for i in range(9)}
    for i in (str((i+1)**2)[0] for i in xrange(maxvalue)):
        BenfordList[i] += 1

takes the time from about 1.55s to about 1.25; however taking out the **2 takes the time down to about 0.32s.

In other words, the vast majority of your time is spent squaring your operands.

Curiously, I was able to shave about 0.05s by using "%s" % ((i+1)**2) instead of str((i+1)**2).

Gabe
  • 84,912
  • 12
  • 139
  • 238
  • Sweet! I figured the bulk of the time is spent generating the numbers (i.e. the function or data-read), but I really like that your answer is faster yet (1.2564006048 sec) and avoids importing a special method. – ksed May 28 '13 at 18:59
  • Could just change the range to be `range(1, 10)` and avoid the `+1` ops – Jon Clements May 28 '13 at 19:00