2

I have python 2.7 code to square the values in a FreqDist (i.e. an NLTK frequency distribution), and the sum all the squares.

For example, from this: You should get: 2*2 + 1*1 + 1*1 + 1*1 = 7

This works for me, but I was wondering whether there was a "better" way to do it than this:

        for word, frequency in t.freq_dist.iteritems():
            total += frequency*frequency

I'm asking because I need to then loop through freq_dist again for something else; right after this code, so I figured it's not good practice to have to loop through it twice if there's a better way...

Zach
  • 4,624
  • 13
  • 43
  • 60

3 Answers3

1
lst = [2, 1, 1, 1]

Using a generator expression:

sum(i**2 for i in lst)

gives

7

Alternatively, list comprehension also works:

sum([i**2 for i in lst])

If you don't need the squared values for some purpose later, then the generator expression is a better choice as it creates the values only once on demand whereas list comprehension creates the whole list in memory. For more information see this SO question comparing list comprehension vs generators.

Community
  • 1
  • 1
Levon
  • 138,105
  • 33
  • 200
  • 191
1

If you use Numpy, you can just square the array:

>>> from numpy import array
>>> values = array([2, 1, 1, 1])
>>> sum(values**2)
7

If you're going to be doing any repetitive, heavy computations, I'd suggest you use Numpy. It'll give you huge speed boosts.

Blender
  • 289,723
  • 53
  • 439
  • 496
  • Sadly, there's no array here; NLTK doesn't leverage Numpy as much as it could. – Fred Foo May 26 '12 at 20:01
  • 1
    ... and the question is whether it would be faster because of the memory allocations needed when converting to `np.array`. Besides, if you're going to use Numpy, I'd try `np.dot(values, values)` first. – Fred Foo May 26 '12 at 20:10
0

If the second loop have a dependancy on total, no, there's not going to be a better way to do it. If it don't have such a dependancy, yes, the better (i.e., faster) way to do it would be to include the work from the other loop inside the current loop. But in reality the speedup should be non-significant.

Emil Vikström
  • 90,431
  • 16
  • 141
  • 175