Rearranging list items based on a score to fit a function curve

Question

Given that I have:

a list of words
points/scores that indicates "simplicity" for each word
the difficulty levels of each word:

E.g.

>>> words = ['apple', 'pear', 'car', 'man', 'average', 'older', 'values', 'coefficient', 'exponential']
>>> points = ['9999', '9231', '8231', '5123', '4712', '3242', '500', '10', '5']
>>> bins = [0, 0, 0, 0, 1, 1, 1, 2, 2]

Currently, the word list is ordered by the simplicity points.

What if I want to model the simplicity as a "quadratic curve"?, i.e. from highest to a low point and then back to high, i.e. produce a word list that looks like this with the corresponding points:

['apple', 'pear', 'average', 'coefficient', 'exponential', 'older', 'values', 'apple', 'pear']

I have tried this but it's painfully crazy:

>>> from collections import Counter
>>> Counter(bins)[0]
4
>>> num_easy, num_mid, num_hard = Counter(bins)[0], Counter(bins)[1], Counter(bins)[2]
>>> num_easy
4
>>> easy_words = words[:num_easy]
>>> mid_words = words[num_easy:num_easy+num_mid]
>>> hard_words = words[-num_hard:]
>>> easy_words, mid_words, hard_words
(['apple', 'pear', 'car', 'man'], ['average', 'older', 'values'], ['coefficient', 'exponential'])
>>> easy_1 = easy_words[:int(num_easy/2)]
>>> easy_2 = easy_words[len(easy_1):]
>>> mid_1 = mid_words[:int(num_mid/2)]
>>> mid_2 = mid_words[len(mid_1):]
>>> new_words = easy_1 + mid_1 + hard_words + mid_2 + easy_1 
>>> new_words
['apple', 'pear', 'average', 'coefficient', 'exponential', 'older', 'values', 'apple', 'pear']

Imagine the no. of bins is >3 or maybe I want to "points" of the words to fit an sine-shape curve.

Note that this has not exactly an nlp question nor it has anything to do with 'zipf' distribution and creating something to match or reorder the ranking of the word.

Imagine there's a list of integers you have an object (in this case a word) map to each integer and you want to reorder the list of object to fit a quadratic curve.

Are the `points` irrelevant or is the `bin` value derived of the points? — user2390182, Mar 02 '17 at 07:36
you mention "the word list is ordered by the simplicity `points`" but `points` doesn't look ordered in your example as it has subsequence `'5123', '3242', '4712'`. Is there anything wrong in my understanding? — Rohanil, Mar 02 '17 at 07:41

score 2 · Answer 1 · answered Mar 02 '17 at 07:43

Sort it into a list according to your custom criteria, check whether its length is even or odd, then zip it in chunks of two and reverse the last half:

>>> def peak(s):
...     return s[::2]+s[-1-(len(s)%2)::-2]
...
>>> peak('112233445566778')
'123456787654321'
>>> peak('1122334455667788')
'1234567887654321'

Note that uneven data may produce asymmetrical results:

>>> peak('11111123')
'11123111'

user2390182 · Accepted Answer · 2017-03-02T07:54:33.523

I'd do sth along these lines. Sort the words by their points, take every second out, reverse that half and concat the two:

>>> s = sorted(zip(map(int, points), words))
>>> new_words = [word for p, word in list(reversed(s[::2])) + s[1::2]]
# If you have lots of words you'll be better off using some 
# itertools like islice and chain, but the principle becomes evident
>>> new_words
['apple', 'car', 'older', 'values', 'exponential', 'coefficient', 'average', 'man', 'pear']

Ordered as in:

[(9999, 'apple'), (8231, 'car'), (4712, 'older'), (500, 'values'), (5, 'exponential'), (10, 'coefficient'), (3242, 'average'), (5123, 'man'), (9231, 'pear')]

Rearranging list items based on a score to fit a function curve

2 Answers2