15

This is more of an 'interesting' phenomena I encountered in a Python module that I'm trying to understand, rather than a request for help (though a solution would also be useful).

>>> import fuzzy
>>> s = fuzzy.Soundex(4)
>>> a = "apple"
>>> b = a
>>> sdx_a = s(a)
>>> sdx_a
'A140'
>>> a
'APPLE'
>>> b
'APPLE'

Yeah, so the fuzzy module totally violates the immutability of strings in Python. Is it able to do this because it is a C-extension? And does this constitute an error in CPython as well as the module, or even a security risk?

Also, can anyone think of a way to get around this behaviour? I would like to be able to keep the original capitalisation of the string.

Cheers,

Alex

Alex
  • 18,332
  • 10
  • 49
  • 53
  • I don't see anywhere in the generated C where it mutates the string. – Ignacio Vazquez-Abrams Apr 30 '12 at 03:32
  • @IgnacioVazquez-Abrams: maybe I'm missing something, but doesn't it mutate it in `__call__` [`__pyx_f_5fuzzy_7Soundex___call__`]? It declares a cdef char ptr which it sets equal to the result of a PyString_AsString call, and then modifies the contents. – DSM Apr 30 '12 at 03:47
  • @DSM: Not in the code in Bitbucket. I only see reads from it, on [line 891](https://bitbucket.org/yougov/fuzzy/src/c210ad2f3f68/src/fuzzy.c#cl-891). – Ignacio Vazquez-Abrams Apr 30 '12 at 03:52
  • @IgnacioVazquez-Abrams: ah. I was looking the released version, not trunk. – DSM Apr 30 '12 at 04:02

4 Answers4

13

This bug was resolved back in February; update your version.

To answer your question, yes, there are several ways to modify immutable types at the C level. The security implications are unknown, and possibly even unknowable, at this point.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • Thanks for this answer! Actually, I used easy_install to install fuzzy only three weeks ago. The version it's giving me is fuzzy-1.0-py2.7-win-amd64.egg, and it's this version that has the error. – Alex May 01 '12 at 01:08
  • @Alex: They don't always keep it up to date there; install from Bitbucket. – Ignacio Vazquez-Abrams May 01 '12 at 01:29
2

I don't have the fuzzy module available to test right now, but the following creates a string with a new identity:

>>> a = "hello"
>>> b = ''.join(a)
>>> b
'hello'
>>> id(a), id(b)
(182894286096, 182894559280)
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
2

I don't know much about CPython, but it looks like in fuzzy.c it declares char *cs = s, where s is the input to __call__. It then mutates cs[i], which will obviously mutate s[i] and therefore the original string. This is definitely a bug with Fuzzy and you should file it on the bitbucket. As Greg's answer said, using ''.join(a) will create a new copy.

Venge
  • 2,417
  • 16
  • 21
0

If it changes the immutable string, it's a bug, you can walk around this by:

s(a.upper())
HYRY
  • 94,853
  • 25
  • 187
  • 187