2

In:

case-insensitive list sorting, without lowercasing the result?

I've seen two solutions: (Let;s assume input is list of utf-8 strings, e.g. ['z1', 'A1', 'a0', 'bC'])

  • Without lambda: L.sort(key = str.lower);
  • With lambda: L.sort(key = lambda s: s.lower());

What are differences? Which is better or more "pythonic"?

(As I tagged question is about python-3.x. All comments related to behaviour specific to python 2 are welcome, but please make note)

Community
  • 1
  • 1
Grzegorz Wierzowiecki
  • 10,545
  • 9
  • 50
  • 88
  • 3
    The first one. Because simple is better than complex (and it is much faster) – JBernardo Oct 16 '12 at 19:31
  • @JBernardo -- make that an answer and I'll upvote it. – mgilson Oct 16 '12 at 19:32
  • 1
    As a general note, unless a lambda is very, very concise (and more concise than any alternative), it's generally the wrong course of action. Python has functions you can pass around as ordinary variables, so lambdas are not actually that useful. – Gareth Latty Oct 16 '12 at 19:40
  • Reminds me of various comments about regular expressions. In [Reactive Programming with JavaScript](https://www.amazon.com/dp/1783558555/) I explained why I considered them one of JavaScript's (write-only) bad parts, and deliberately ported a model with central use of regular expressions was superceded by programmed that accomplished a similar effect, but written in e.g. English, not runes and line noise. – Christos Hayward Jul 21 '16 at 19:08

2 Answers2

4

str.lower is an unbound method of the str type, lambda s: s.lower() is an anonymous function. The end effect is the same: for each element in L, the key callable is called, passing in the element.

The effects could be different if you are not sorting strings. The str.lower method only works with strings, while the anonymous lambda function will work with anything that has a .lower() method; use this when you have a mixture of bytes and str, for example.

I prefer the first version, it's shorter and a little faster. Moreover, it'll throw an error if my values are not strings, which is usually what you want.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    To be fair, it has different semantics if `s` is not a string (but an unary `lower` method). But that's unlikely, probably unexpected, and if it's unexpected the `str.lower` version is better because it throws an error instead of silently working. Exception: Unicode strings in Python 2. I'm not really sure what happens then. –  Oct 16 '12 at 19:35
  • @delnan: yeah, that is an important difference. Expanded. – Martijn Pieters Oct 16 '12 at 19:38
  • @delnan -- could you make an answer from your comment about differences in semantics, so I could select it as solution? – Grzegorz Wierzowiecki Oct 16 '12 at 19:45
  • @delnan: passing a `unicode` string to `str.lower` doesn't work, it throws an error. Same for `bytes` strings in python 3.x. – Martijn Pieters Oct 16 '12 at 19:48
  • Of course if someone has a custom string class that's overridden `lower`, then `str.lower` could give misleading results. Another option which is similar to `lambda s: s.lower()` is to borrow from the operator module and provide `key=methodcaller('lower')` – Jon Clements Oct 16 '12 at 19:55
2

A 3rd option is methodcaller from the operator module.

from operator import methodcaller
L.sort(key=methodcaller('lower'))

It's equivalent to the lambda option in what it does, but depending on taste, is nice and readable and being from the operator module, fairly nippy. The str.lower will break if the object is a unicode string (and vice-versa).

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • Are you sure that `str.lower` will break on `unicode`? The question is about Python3k, which I thought is better prepared for `utf-8` than py2k. – Grzegorz Wierzowiecki Oct 16 '12 at 20:14
  • @GrzegorzWierzowiecki: I am sure, `str.lower` with a `unicode` value in python 2, or a `bytes` value in python 3, throws a `TypeError`. – Martijn Pieters Oct 16 '12 at 20:16
  • So `python-3.x` works with `unicode` (but not with `bytes`). Thanks for ensuring ! – Grzegorz Wierzowiecki Oct 16 '12 at 20:19
  • @GrzegorzWierzowiecki Your mention of UTF-8 tells me you should read [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://joelonsoftware.com/articles/Unicode.html). –  Oct 16 '12 at 20:40
  • Nice post. I will be useful for readers. Many ways of encoding Unicode (utf-8/16/32, big/logendian, with or without BOM...), are why I've narrowed my question to `utf-8`. This reminds me very nice [subchapter 4.2 Unicode of Dive into Python 3](http://getpython3.com/diveintopython3/strings.html) – Grzegorz Wierzowiecki Oct 16 '12 at 22:14