How do I case fold a string in Python 2?

Question

Python 3.3 adds the casefold method to the str type, but in 2.x I don't have anything. What's the best way to work around this?

do you need to deal with non-English strings? – roippi Aug 16 '13 at 11:12 — roippi, Aug 16 '13 at 11:12
Yes. I want to run the unicode case folding algorithm. – Devin Jeanpierre Aug 16 '13 at 11:35 — Devin Jeanpierre, Aug 16 '13 at 11:35

score 4 · Answer 1 · answered Sep 28 '15 at 21:23

4

Check out py2casefold.

>>> from py2casefold import casefold
>>> print casefold(u"tschüß")
tschüss
>>> casefold(u"ΣίσυφοςﬁÆ") == casefold(u"ΣΊΣΥΦΟσFIæ") == u"σίσυφοσfiæ"
True

answered Sep 28 '15 at 21:23

Russ

10,835
12
42
57

[It doesn't seem to be very well tested](https://github.com/rwarren/py2casefold/blob/39a14b8971040b8f6015b7fa1a401c19c121175f/tests/test_casefold.py) – jfs Sep 29 '15 at 03:07
J.F. Sebastian -- what would you add? For reference, [here](https://hg.python.org/cpython/file/d4669f43d05f/Lib/test/test_unicode.py#l568) is the extent of python 3's `str.casefold` unit test. – Russ Sep 29 '15 at 03:56
see whether you'll manage to fish out a useful test from [`regex:test_case_folding()`](https://bitbucket.org/mrabarnett/mrab-regex/src/cbdb3caaee9ec68fdc2bff7e30902fb1dbdd3fd7/regex_3/Python/test_regex.py?at=default&fileviewer=file-view-default#test_regex.py-571) – jfs Sep 29 '15 at 04:04
J.F. Sebastian -- Thanks for the link, but there wasn't much useful I could see in that regex testing (a comment or two would be nice in there!). Even still, I wasn't super pleased with the slim unit testing either, so I beefed it up a tad. There really isn't much to the casefolding operation, though. Not a heck of a lot that *can* be tested. – Russ Sep 29 '15 at 04:29
I can't believe that something is uncomplicated in Unicode e.g., are you sure no casefolding properties have been changed between different versions of the Unicode standard -- may I expect that casefold works the same between different Python 2 versions? Read [what @tchrist says on the related topic](http://stackoverflow.com/a/6996550/4279) – jfs Sep 29 '15 at 04:44
J.F. Sebastian -- I know what you mean, but case folding is actually fundamentally pretty simple. @tchrist is 100% correct, but he is basically just saying to use the unicode casefold "algorithm" and *not* lowercase. As far as functionality between python 2 versions, operation should be 100% identical. The case folding operation is basically just a lookup table. The key is to use the [master unicode table](http://www.unicode.org/Public/UNIDATA/CaseFolding.txt), which is currently at 8.0.0 (and included in the package). – Russ Sep 29 '15 at 05:16
Oh... and all the @tchrist tests will definitely pass (if he got them right :) ). Several cases are already covered, and are pretty standard for case folding. – Russ Sep 29 '15 at 05:17
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/90849/discussion-between-j-f-sebastian-and-russ). – jfs Sep 29 '15 at 05:25

score 2 · Answer 2 · edited May 23 '17 at 11:54

2

There is a thread here which covers some of the issues (but may not resolve all), you can judge whether it is suitable for what you need. If this is no good then there are some useful tips for implementing case folding on the W3C site here.

edited May 23 '17 at 11:54

Community

1
1

answered Aug 16 '13 at 13:18

ChrisProsser

12,598
6
35
44

score 1 · Answer 3 · edited Jan 18 '21 at 12:35

If PyICU is already installed; you could use it to define casefold(). Using the same example strings as in @Russ' answer:

>>> import icu
>>> casefold = lambda u: unicode(icu.UnicodeString(u).foldCase())
>>> print casefold(u"tschüß")
tschüss
>>> casefold(u"ΣίσυφοςﬁÆ") == casefold(u"ΣΊΣΥΦΟσFIæ") == u"σίσυφοσfiæ"
True
>>> icu.UNICODE_VERSION
'6.3'
>>> import unicodedata
>>> unicodedata.unidata_version
'5.2.0'

The result may depend on the version of Unicode standard.

How do I case fold a string in Python 2?

3 Answers3

Linked