59

I'm working with Python itertools and using groupby to sort a bunch of pairs by the last element. I've gotten it to sort and I can iterate through the groups just fine, but I would really love to be able to get the length of each group without having to iterate through each one, incrementing a counter.

The project is cluster some data points. I'm working with pairs of (numpy.array, int) where the numpy array is a data point and the integer is a cluster label

Here's my relevant code:

data = sorted(data, key=lambda (point, cluster):cluster)
for cluster,clusterList in itertools.groupby(data, key=lambda (point, cluster):cluster):
    if len(clusterList) < minLen:

On the last line: if len(clusterList) < minLen:, I get an error that

object of type 'itertools._grouper' has no len()

I've looked up the operations available for _groupers, but can't find anything that seems to provide the length of a group.

Gilad Green
  • 36,708
  • 7
  • 61
  • 95
user1466679
  • 593
  • 1
  • 4
  • 4

3 Answers3

82

Just because you call it clusterList doesn't make it a list! It's basically a lazy iterator, returning each item as it's needed. You can convert it to a list like this, though:

clusterList = list(clusterList)

Or do that and get its length in one step:

length = len(list(clusterList))

If you don't want to take up the memory of making it a list, you can do this instead:

length = sum(1 for x in clusterList)

Be aware that the original iterator will be consumed entirely by either converting it to a list or using the sum() formulation.

kindall
  • 178,883
  • 35
  • 278
  • 309
3

clusterList is iterable but it is not a list. This can be a little confusing sometimes. You can do a for loop over clusterList but you can't do other list things over it (slice, len, etc).

Fix: assign the result of list(clusterList) to clusterList.

Brian Cain
  • 14,403
  • 3
  • 50
  • 88
0

You can use cardinality package for that. Method count() counts the number of items that iterable yields.

cardinality: determine and check the size of any iterable

The following code gives you the length of clusterList

import cardinality
cardinality.count(clusterList)
Maryam Bahrami
  • 1,056
  • 9
  • 18