Python 3.3 adds the casefold
method to the str type, but in 2.x I don't have anything. What's the best way to work around this?
Asked
Active
Viewed 7,420 times
11

Charles
- 50,943
- 13
- 104
- 142

Devin Jeanpierre
- 92,913
- 4
- 55
- 79
-
do you need to deal with non-English strings? – roippi Aug 16 '13 at 11:12
-
Yes. I want to run the unicode case folding algorithm. – Devin Jeanpierre Aug 16 '13 at 11:35
3 Answers
4
Check out py2casefold.
>>> from py2casefold import casefold
>>> print casefold(u"tschüß")
tschüss
>>> casefold(u"ΣίσυφοςfiÆ") == casefold(u"ΣΊΣΥΦΟσFIæ") == u"σίσυφοσfiæ"
True

Russ
- 10,835
- 12
- 42
- 57
-
[It doesn't seem to be very well tested](https://github.com/rwarren/py2casefold/blob/39a14b8971040b8f6015b7fa1a401c19c121175f/tests/test_casefold.py) – jfs Sep 29 '15 at 03:07
-
J.F. Sebastian -- what would you add? For reference, [here](https://hg.python.org/cpython/file/d4669f43d05f/Lib/test/test_unicode.py#l568) is the extent of python 3's `str.casefold` unit test. – Russ Sep 29 '15 at 03:56
-
see whether you'll manage to fish out a useful test from [`regex:test_case_folding()`](https://bitbucket.org/mrabarnett/mrab-regex/src/cbdb3caaee9ec68fdc2bff7e30902fb1dbdd3fd7/regex_3/Python/test_regex.py?at=default&fileviewer=file-view-default#test_regex.py-571) – jfs Sep 29 '15 at 04:04
-
J.F. Sebastian -- Thanks for the link, but there wasn't much useful I could see in that regex testing (a comment or two would be nice in there!). Even still, I wasn't super pleased with the slim unit testing either, so I beefed it up a tad. There really isn't much to the casefolding operation, though. Not a heck of a lot that *can* be tested. – Russ Sep 29 '15 at 04:29
-
I can't believe that something is uncomplicated in Unicode e.g., are you sure no casefolding properties have been changed between different versions of the Unicode standard -- may I expect that casefold works the same between different Python 2 versions? Read [what @tchrist says on the related topic](http://stackoverflow.com/a/6996550/4279) – jfs Sep 29 '15 at 04:44
-
J.F. Sebastian -- I know what you mean, but case folding is actually fundamentally pretty simple. @tchrist is 100% correct, but he is basically just saying to use the unicode casefold "algorithm" and *not* lowercase. As far as functionality between python 2 versions, operation should be 100% identical. The case folding operation is basically just a lookup table. The key is to use the [master unicode table](http://www.unicode.org/Public/UNIDATA/CaseFolding.txt), which is currently at 8.0.0 (and included in the package). – Russ Sep 29 '15 at 05:16
-
Oh... and all the @tchrist tests will definitely pass (if he got them right :) ). Several cases are already covered, and are pretty standard for case folding. – Russ Sep 29 '15 at 05:17
-
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/90849/discussion-between-j-f-sebastian-and-russ). – jfs Sep 29 '15 at 05:25
2
There is a thread here which covers some of the issues (but may not resolve all), you can judge whether it is suitable for what you need. If this is no good then there are some useful tips for implementing case folding on the W3C site here.

Community
- 1
- 1

ChrisProsser
- 12,598
- 6
- 35
- 44
1
If PyICU is already installed; you could use it to define casefold()
. Using the same example strings as in @Russ' answer:
>>> import icu
>>> casefold = lambda u: unicode(icu.UnicodeString(u).foldCase())
>>> print casefold(u"tschüß")
tschüss
>>> casefold(u"ΣίσυφοςfiÆ") == casefold(u"ΣΊΣΥΦΟσFIæ") == u"σίσυφοσfiæ"
True
>>> icu.UNICODE_VERSION
'6.3'
>>> import unicodedata
>>> unicodedata.unidata_version
'5.2.0'