Sorting iterables based on preferred order of strings

Question

Supposing I have a list/tuple like this:

MyLocation = 'DE'
(    
('Pencils', 'Artists Pencils', 18.95, 'PVT', 'DE'),
('Pencils', '', 19.95, 'PVT', 'IT'),
('Pencils', '', 23.50, 'PRF1', 'US'),
('Pencils', 'Wooden Pencils', 23.50, 'PRF2', 'DE'),
('Pencils', '', 12.50, 'NON', 'DE'))

I want to sort this in two passes, by the following rules:

1) Tuples matching the MyLocation string 'DE' in the [4] element, on top
This being an intermediate step, the relative order between the DEs doesn't matter. Just so that all DEs are at the top.

(    
('Pencils', '', 12.50, 'NON', 'DE'),
('Pencils', 'Wooden Pencils', 23.50, 'PRF2', 'DE'),
('Pencils', 'Artists Pencils', 18.95, 'PVT', 'DE'),    
('Pencils', '', 23.50, 'PRF1', 'US'),
('Pencils', '', 19.95, 'PVT', 'IT')       
)

2) After that, sort on the [3]rd element, the preferred order should be ['PRF1', 'PRF2', 'PRF3']. Other strings can be left at lower positions.

My expected final sorted output is

(    
('Pencils', '', 23.50, 'PRF1', 'US'),
('Pencils', 'Wooden Pencils', 23.50, 'PRF2', 'DE'),
('Pencils', 'Artists Pencils', 18.95, 'PVT', 'DE'),    
('Pencils', '', 12.50, 'NON', 'DE'),
('Pencils', '', 19.95, 'PVT', 'IT')       
)

How would I go about these two sorts? I can manage the first sorting with del and insert, but what is the recommended way?

tempList = actualList
i = 0
for record in actualList:
    if record[5] == 'DE':
        del tempList[i]
        tempList.insert(0, record)
    i = i + 1
actualList = tempList

I am especially confused about how I would proceed with the second sorting. Please provide code samples for the second sorting.

Fabiano · Answer 1 · 2012-04-12T03:25:45.410

this is enough:

PRF = ('PRF1', 'PRF2', 'PRF3')
sorted(records, key=lambda x:(x[4]!='DE', PRF.index(x[3]) if x[3] in PRF else 3))

or if you gonna use this more then once you may want to split the key function:

k = lambda x: (x[4]!='DE', PRF.index(x[3]) if x[3] in PRF else len(PRF))

and then just use

sorted(records, key=k)

in your example:

>>> records = ( ('Pencils', 'Artists Pencils', 18.95, 'PVT', 'DE'),
... ('Pencils', '', 19.95, 'PVT', 'IT'),
... ('Pencils', '', 23.50, 'PRF1', 'US'),
... ('Pencils', 'Wooden Pencils', 23.50, 'PRF2', 'DE'),
... ('Pencils', '', 12.50, 'NON', 'DE') )
>>> import pprint
>>> pprint.pprint(sorted(records, key=k))
[('Pencils', 'Wooden Pencils', 23.5, 'PRF2', 'DE'),
 ('Pencils', 'Artists Pencils', 18.95, 'PVT', 'DE'),
 ('Pencils', '', 12.5, 'NON', 'DE'),
 ('Pencils', '', 23.5, 'PRF1', 'US'),
 ('Pencils', '', 19.95, 'PVT', 'IT')]

Devin Jeanpierre · Answer 2 · 2011-04-13T18:02:08.147

You only need one pass, with a special key function.

def key(t):
    return (
        dict(PRF1=0, PRF2=1, PRF3=2).get(t[3], 3), # earlier ones get smaller numbers
        int(t[4] != 'DE')) # 0 if DE, 1 otherwise

L.sort(key=key)

The key function returns a value that will be used to compare elements in the list. This one returns a tuple of two elements, and tuples compare based on the earliest different element. So (1, 0) < (2, -300) because 1 < 2.

The first value is the index of t[3] in the list ['PRF1', 'PRF2', 'PRF3'] or the number 3 if it isn't any of those. This means the earlier in the list it is, the lower the value, and the earlier in the sort results. The second value is already explained in the comments. :)

Note that the 'DE' pass was made first to give 'DE' *lowest* priority. You should swap the return values. Also index raises ValueError when none of the PRF's are present, which is possible in the example. — sverre, Apr 13 '11 at 17:59

score 1 · Accepted Answer · answered Apr 13 '11 at 18:14

1

The general idea is to give each item a score. When you have multiple scores per item you can put it in a tuple.

MyLocation = 'DE'
location_score = { MyLocation : 1 }
that_other_field_score = {'PRF1' : 3, 'PRF2' : 2, 'PRF3' : 1}

def score( row ):
    # returns a tuple of item score
    # items not in the score dicts get score 0 for that field
    return ( that_other_field_score.get(row[3], 0),
                  location_score.get(row[4], 0))    

data = [    
('Pencils', 'Artists Pencils', 18.95, 'PVT', 'DE'),
('Pencils', '', 19.95, 'PVT', 'IT'),
('Pencils', '', 23.50, 'PRF1', 'US'),
('Pencils', 'Wooden Pencils', 23.50, 'PRF2', 'DE'),
('Pencils', '', 12.50, 'NON', 'DE')]

# sort data, highest score first
data.sort(key=score, reverse=True)
print data

The location_score dict is arguably a bit overkill (you could just write (1 if row[4]=='DE' else 0)) but on the other hand it can be easily extended this way.

answered Apr 13 '11 at 18:14

Jochen Ritzel

104,512
31
200
194

Wow, that is some really clear and understandable code. Thanks! I have a question though; how does sorting on a score tuple work? Is (1,0) equal to (0,1)? I would think the score function could just add the scores and return their sum? – Pranab Apr 13 '11 at 19:14
Ok, now I really love your answer even more. It seems that Tuples are sorted in the order of elements, so further elements will have lesser weight. Wow. http://stackoverflow.com/questions/644170/how-does-python-sort-a-list-of-tuples – Pranab Apr 13 '11 at 19:23
@Pranab: Yeah, tuples are compared element-wise, so `(0,100) < (1,0)`. Summing scores can be useful too, imagine you want to buy something: `price + distance * transportation_cost` is a much better metric than `(price, distance * transportation_cost)` :-) – Jochen Ritzel Apr 14 '11 at 00:29

score 0 · Answer 4 · answered Apr 13 '11 at 17:53

0

This is bit hacky, but it should do.

def prf_key(item):
    if item[3][:3] == 'PRF':
        return (0, int(item[3:]))
    else:
        return (1, None)

actualList.sort(key = prf_key)

The idea is that any PRF should go on top, so it returns a tuple starting with 0, the rest with 1; then the PRF's are sorted among themselves by their number.

answered Apr 13 '11 at 17:53

Fred Foo

355,277
75
744
836

Perhaps you could insert (item[4] != 'DE') in the middle of the returned tuple to catch that condition too? – sverre Apr 13 '11 at 18:04

Sorting iterables based on preferred order of strings

4 Answers4