9

I have a question: This is list of lists, formed by ElementTree library.

   [['word1', <Element tag at b719a4cc>], ['word2', <Element tag at b719a6cc>], ['word3', <Element tag at b719a78c>], ['word4', <Element tag at b719a82c>]]

word1..4 may contain unicode characters i.e (â,ü,ç).

I want to sort this list of lists by my custom alphabet.

I know how to sort by custom alphabet from here sorting words in python

I also know how to sort by key from here http://wiki.python.org/moin/HowTo/Sorting

The problem is that I couldn't find the way how to apply these two method to sort my "list of lists".

Community
  • 1
  • 1
microspace
  • 386
  • 1
  • 5
  • 18
  • 1
    Fine question, if you supplied enough code that we could run it I bet someone would just post a full solution (especially if you post what you have tried). – Brian Larsen May 18 '12 at 02:46
  • I agree with Brian, add some code that we can copy and paste and it would probably take someone less than 5 min to write an fully working answer. – John La Rooy May 18 '12 at 03:22
  • Hello! I have one more issue. How to make sorting **case insensitive**? – microspace May 19 '12 at 17:05
  • You could try changing c in the lambda function to c.lower(), which will convert the character to lower case. But that might not work for your character set. If it doesn't, you could list your alphabet with consecutive upper-case and lower-case characters - e.g. "AaBcCc...", and then change the lambda function to return int(alphabet.index(c)/2), which should map each pair of adjacent characters in your list to the same priority. – happydave May 20 '12 at 02:38
  • (alphabet.index(c)/2) is good solution but for a, e, i and o I have some special diactric letters, example:alphabet = u'aáàAâÂbBcCçÇdDeéEfFgGğĞhHiİîÎíīıIjJkKlLmMnNóoOöÖpPqQrRsSşŞtTuUûúÛüÜvVwWxXyYzZ. Hov to handle them? thank you. – microspace May 20 '12 at 05:10

3 Answers3

19

Your first link more or less solves the problem. You just need to have the lambda function only look at the first item in your list:

alphabet = "zyxwvutsrqpomnlkjihgfedcba"

new_list = sorted(inputList, key=lambda word: [alphabet.index(c) for c in word[0]])

One modification I might suggest, if you're sorting a reasonably large list, is to change the alphabet structure into a dict first, so that index lookup is faster:

alphabet_dict = dict([(x, alphabet.index(x)) for x in alphabet)
new_list = sorted(inputList, key=lambda word: [alphabet_dict[c] for c in word[0]])
happydave
  • 7,127
  • 1
  • 26
  • 25
2

If I'm understanding you correctly, you want to know how to apply the key sorting technique when the key should apply to an element of your object. In other words, you want to apply the key function to 'wordx', not the ['wordx', ...] element you are actually sorting. In that case, you can do this:

my_alphabet = "..."

def my_key(elem):
    word = elem[0]
    return [my_alphabet.index(c) for c in word]

my_list.sort(key=my_key)

or using the style in your first link:

my_alphabet = "..."
my_list.sort(key=lambda elem: [my_alphabet.index(c) for c in elem[0]])

Keep in mind that my_list.sort will sort in place, actually modifying your list. sorted(my_list, ...) will return a new sorted list.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
KP.
  • 1,247
  • 1
  • 9
  • 12
0

Works great!!! Thank you for your help Here is my story: I have turkish-russian dictionary in xdxf format. the problem was to sort it. I've found solution here http://effbot.org/zone/element-sort.htm but it didn't sort unicode characters. here is final source code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
import codecs
alphabet = u"aâbcçdefgğhiıjklmnoöpqrstuüvwxyz"
tree = ET.parse("dict.xml")
# this element holds the phonebook entries
container = tree.find("entries")
data = []
for elem in container:
    keyd = elem.findtext("k")
    data.append([keyd, elem])
data.sort(key=lambda data: [alphabet.index(c) for c in data[0]])
container[:] = [item[-1] for item in data]
tree.write("new-dict.xml", encoding="utf-8")

sample content of dict.xml

<cont>
  <entries>
<ar><k>â</k>def1</ar>
<ar><k>a</k>def1</ar>
<ar><k>g</k>def1</ar>
<ar><k>w</k>def1</ar>
<ar><k>n</k>def1</ar>
<ar><k>u</k>def1</ar>
<ar><k>ü</k>def1</ar>
<ar><k>âb</k>def1</ar>
<ar><k>ç</k>def1</ar>
<ar><k>v</k>def1</ar>
<ar><k>ac</k>def1</ar>
  </entries>
</cont>

Thank to all

microspace
  • 386
  • 1
  • 5
  • 18