0

I have a data structure consisting of a list of tagsand weights pairs, like so:

tags = [['male vocalists', 4], ['Lo-Fi', 2], ['pop underground', 2], ['pop', 16], ['power pop', 99], ['post rock', 2], ['alternative', 59], ['electronic', 2], ['classic rock', 2], ['alternative rock', 14], ['pop rock', 2], ['baroque pop', 2], ['powerpop', 4], ['melodic', 2], ['seen live', 62], ['Bellshill', 3], ['singer-songwriter', 2], ['Favourites', 2], ['Teenage Fanclub', 4], ['emo', 2], ['glasgow', 12], ['Scottish', 73], ['indie pop', 27], ['indie', 100], ['00s', 3], ['new wave', 3], ['rap', 2], ['ambient', 2], ['brit pop', 2], ['90s', 14], ['britpop', 26], ['indie rock', 68], ['electronica', 2], ['shoegaze', 5], ['scotland', 11], ['post-punk', 3], ['Alt-country', 2], ['80s', 3], ['jangle pop', 7], ['guitar pop', 4], ['Pop-Rock', 2], ['rock', 31], ['favorites', 2], ['creation records', 3], ['All', 2], ['punk', 3], ['scottish pop', 2], ['british', 17], ['scottish indie', 2], ['slowcore', 2], ['UK', 6], ['jangly', 2]]

I know I can get tag with the highest value with:

top = max(tags, key=lambda x:x[1])[0]

which yields indie, correctly.

but how do I get N highest values, say, 5?

5 Answers5

2

Slice the first 5 elements from a descending sort.

sorted(tags, key=lambda x:x[1], reverse=True)[:5]

MSeifert's answer is technically better algorithmically. With a large list of length n and comparatively small number of elements to take m, then heapq.largest may be quicker since it takes O(n * log m) time whereas sorting then slicing takes O(n * log n). (See here for a rough outline of heapq.largest's algorithm). And then again, logs are almost negligible, so be sure to test if performance is a concern to you!

Community
  • 1
  • 1
Trevor Merrifield
  • 4,541
  • 2
  • 21
  • 24
1

Use heapq.nlargest:

>>> import heapq

>>> heapq.nlargest(5, tags, key=lambda x:x[1])
[['indie', 100],
 ['power pop', 99],
 ['Scottish', 73],
 ['indie rock', 68],
 ['seen live', 62]]

or if you're only interested in the name:

>>> [name for name, _ in heapq.nlargest(5, tags, key=lambda x:x[1])]
['indie', 'power pop', 'Scottish', 'indie rock', 'seen live']
MSeifert
  • 145,886
  • 38
  • 333
  • 352
0

heapq lets you do some really cool stuff like that:

In [168]: heapq.nlargest(5, tags, key=operator.itemgetter(1))
Out[168]: 
[['indie', 100],
 ['power pop', 99],
 ['Scottish', 73],
 ['indie rock', 68],
 ['seen live', 62]]
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
0

Use the sorted function, or the .sort method on the list. Both take a key= parameter like max. Then you can take the lowest, or highest, batch.

top = sorted(tags, reverse=True, key=lambda x:x[1])[0:5]
aghast
  • 14,785
  • 3
  • 24
  • 56
0
import operator
def printTopX(tags, X):
    print( sorted(tags, reverse=True, key=operator.itemgetter(1))[0:X] )

printTopX(tags, 5)
Claudio
  • 7,474
  • 3
  • 18
  • 48