The Problem
I'm attempting to count the frequency of a list of strings and sort it in descending order. scipy.stats.itemfreq
generates the frequency results which are output as a numpy array of string elements. This is where I'm stumped. How do I sort it?
So far I have tried operator.itemgetter
which appeared to work for a small list until I realised that it is sorting by the first string character rather than converting the string to an integer so '5' > '11'
as it is comparing 5
and 1
not 5
and 11
.
I'm using python 2.7, numpy 1.8.1, scipy 0.14.0.
Example Code:
from scipy.stats import itemfreq
import operator as op
items = ['platypus duck','platypus duck','platypus duck','platypus duck','cat','dog','platypus duck','elephant','cat','cat','dog','bird','','','cat','dog','bird','cat','cat','cat','cat','cat','cat','cat']
items = itemfreq(items)
items = sorted(items, key=op.itemgetter(1), reverse=True)
print items
print items[0]
Output:
[array(['platypus duck', '5'],
dtype='|S13'), array(['dog', '3'],
dtype='|S13'), array(['', '2'],
dtype='|S13'), array(['bird', '2'],
dtype='|S13'), array(['cat', '11'],
dtype='|S13'), array(['elephant', '1'],
dtype='|S13')]
['platypus duck' '5']
Expected Output:
I'm after the ordering so something like:
[array(['cat', '11'],
dtype='|S13'), array(['platypus duck', '5'],
dtype='|S13'), array(['dog', '3'],
dtype='|S13'), array(['', '2'],
dtype='|S13'), array(['bird', '2'],
dtype='|S13'), array(['elephant', '1'],
dtype='|S13')]
['cat', '11']
Summary
My question is: how do I sort the array (which in this case is a string array) in descending order of counts? Please feel free to suggest alternative and faster/improved methods to my code sample above.