Suppressing treatment of string as iterable

Question

UPDATE:

An idea to make built-in strings non-iterable was proposed on python.org in 2006. My question differs in that I'm trying to only suppress this features once in a while; still this whole thread is quite relevant.

Here are the critical comments by Guido who implemented non-iterable str on a trial basis:

[...] I implemented this (it was really simple to do) but then found I had to fix tons of places that iterate over strings. For example:

The sre parser and compiler use things like set("0123456789") and also iterate over the characters of the input regexp to parse it.

difflib has an API defined for either two lists of strings (a typical line-by-line diff of a file), or two strings (a typical intra-line diff), or even two lists of anything (for a generalized sequence diff).

small changes in optparse.py, textwrap.py, string.py.

And I'm not even at the point where the regrtest.py framework even works (due to the difflib problem).

I'm abandoning this project; the patch is SF patch 1471291. I'm no longer in favor of this idea; it's just not practical, and the premise that there are few good reasons to iterate over a string has been refuted by the use cases I found in both sre and difflib.

ORIGINAL QUESTION:

While it's a neat feature of the language that a string is an iterable, when combined with the duck typing, it may lead to disaster:

# record has to support [] operation to set/retrieve values
# fields has to be an iterable that contains the fields to be set
def set_fields(record, fields, value):
  for f in fields:
    record[f] = value

set_fields(weapon1, ('Name', 'ShortName'), 'Dagger')
set_fields(weapon2, ('Name',), 'Katana')
set_fields(weapon3, 'Name', 'Wand') # I was tired and forgot to put parentheses

No exception will be raised, and there's no easy way to catch this except by testing for isinstance(fields, str) in a myriad places. In some circumstances, this bug will take a very long time to find.

I want to disable strings from being treated as an iterable entirely in my project. Is it a good idea? Can it be done easily and safely?

Perhaps I could subclass built-in str such that I would need to explicitly call get_iter() if I wanted its object to be treated as an iterable. Then whenever I need a string literal, I would instead create an object of this class.

Here are some tangentially related questions:

How can I tell if a python variable is a string or a list?

how to tell a variable is iterable but not a string

I think you have basically answered your own question. Your two methods are the best ways if you have to do it, but the best answer is just make sure it doesn't happen. — Gareth Latty, Feb 06 '12 at 23:35
I'd just stick with the `isinstance(fields, str)` check – you're unlikely to ever need the ability to make your own types that quack like a string. Alternately, make `fields` the last, varargs argument. (Although this won't help if you get tired and forget you're *not* supposed to put parentheses around it.) — millimoose, Feb 06 '12 at 23:52
Any library/language in which strings are defined as generic lists of chars will have this problem. It doesn't seem like a Python thing. — Apalala, Feb 12 '12 at 21:49

kindall · Accepted Answer · 2012-02-06T23:54:07.433

8

There aren't any ways to do this automatically, unfortunately. The solution you propose (a str subclass that isn't iterable) suffers from the same problem as isinstance() ... namely, you have to remember to use it everywhere you use a string, because there's no way to make Python use it in place of the native class. And of course you can't monkey-patch the built-in objects.

I might suggest that if you find yourself writing a function that takes either an iterable container or a string, maybe there's something wrong with your design. Sometimes you can't avoid it, though.

In my mind, the least intrusive thing to do is to put the check into a function and call that when you get into a loop. This at least puts the behavior change where you are most likely to see it: in the for statement, not buried away somewhere in a class.

def iterate_no_strings(item):
    if issubclass(item, str):   # issubclass(item, basestring) for Py 2.x
        return iter([item])
    else:
        return iter(item)

for thing in iterate_no_strings(things):
    # do something...

edited Feb 06 '12 at 23:54

answered Feb 06 '12 at 23:39

kindall

178,883
35
278
309

+1. This is a nice answer if you *have* to do it. I still recommend against it, however. – Gareth Latty Feb 06 '12 at 23:41
What about the function I gave as an example? Would you say this is a case of "wrong design" or "can't avoid it"? – max Feb 06 '12 at 23:43
I kinda waver back and forth. Sometimes I want to say "be liberal in what you accept" and "try to do what the user obviously wants, if possible." In your particular case though, maybe take the value first and the names you want to set as `*args`? Then you'll always get an iterable and the caller just specifies as many names as they have. If they already have a tuple then they just unpack it when calling you. – kindall Feb 06 '12 at 23:47
... and to play devil's advocate to myself, it would be better to put the names first (to match things like `getattr()` and `setattr()`). Like I said, I waver. How about `**kwargs` so you can just specify `Name='Dagger', ShortName='Dagger'` without being too unwieldy? – kindall Feb 06 '12 at 23:57
@kindall That then trades off having to repeat the value. – Gareth Latty Feb 07 '12 at 00:09
Yeah, that would get ugly if there were a lot of attributes, but if it was ever only a couple attributes it might be the least evil. Or you could use some notation to grab values from other args (e.g. `ShortName='@Name'`). – kindall Feb 07 '12 at 00:19
@kindall I think by the time you are doing that, the better option is my class way or, if in python 3, the examples with extended tuple unpacking I gave. – Gareth Latty Feb 07 '12 at 01:34

Gareth Latty · Answer 2 · 2012-02-08T10:39:12.933

To expand, and make an answer out of it:

No, you shouldn't do this.

It changes the functionality people expect from strings.
It means extra overhead throughout your program.
It's largely unnecessary.
Checking types is very unpythonic.

You can do it, and the methods you have given are probably the best ways (~~for the record, I think sub-classing is the better option~~ If you have to do it, see @kindall's method) but it's simply not worth doing, and it's not very pythonic. Avoid the bugs in the first place. In your example, you might want to ask yourself if that's more an issue with clarity in your arguments, and whether named arguments or the splat might be a better solution.

E.g: Change the ordering.

def set_fields(record, value, *fields):
  for f in fields:
    record[f] = value

set_fields(weapon1, 'Dagger', *('Name', 'ShortName')) #If you had a tuple you wanted to use.
set_fields(weapon2, 'Katana', 'Name')
set_fields(weapon3, 'Wand', 'Name')

E.g: Named arguments.

def set_fields(record, fields, value):
  for f in fields:
    record[f] = value

set_fields(record=weapon1, fields=('Name', 'ShortName'), value='Dagger')
set_fields(record=weapon2, fields=('Name'), value='Katana')
set_fields(record=weapon3, fields='Name', value='Wand') #I find this easier to spot.

If you really want the order the same, but don't think the named arguments idea is clear enough, then what about making each record a dict-like item instead of a dict (if it isn't already) and having:

class Record:
    ...
    def set_fields(self, *fields, value):
        for f in fileds:
            self[f] = value

weapon1.set_fields("Name", "ShortName", value="Dagger")

The only issue here is the introduced class and the fact that value parameter has to be done with a keyword, although it keeps it clear.

Alternatively, if you are using Python 3, you always have the option of using extended tuple unpacking:

def set_fields(*args):
      record, *fields, value = args
      for f in fields:
        record[f] = value

set_fields(weapon1, 'Name', 'ShortName', 'Dagger')
set_fields(weapon2, 'Name', 'Katana')
set_fields(weapon3, 'Name', 'Wand')

Or, for my last example:

class Record:
    ...
    def set_fields(self, *args):
        *fields, value = args
        for f in fileds:
            self[f] = value

weapon1.set_fields("Name", "ShortName", "Dagger")

However, these do leave some weirdness when reading the function calls, due to the fact one usually assumes that arguments would not be handled this way.

I know it's unpythonic, that's why I feel bad doing that... But how can I avoid these bugs? We're talking about literally missing a pair of parentheses.. it's almost impossible to avoid once in a while, no? — max, Feb 06 '12 at 23:41
@max As I say, I think that's a problem in how you are structuring your arguments in your method more than a problem with string iteration. — Gareth Latty, Feb 06 '12 at 23:43

score 4 · Answer 3 · answered Feb 07 '12 at 12:26

Type checking in this case is not unpythonic or bad. Just do a:

if isinstance(var, (str, bytes)):
    var = [var]

In the beginning of the call. Or, if you want to educate the caller:

if isinstance(var, (str, bytes)):
    raise TypeError("Var should be an iterable, not str or bytes")

score 2 · Answer 4 · answered Feb 07 '12 at 02:03

2

What do you think about creating a non-iterable string?

class non_iter_str(str):
    def __iter__(self):
        yield self

>>> my_str = non_iter_str('stackoverflow')
>>> my_str
'stackoverflow'
>>> my_str[5:]
'overflow'
>>> for s in my_str:
...   print s
... 
stackoverflow

answered Feb 07 '12 at 02:03

juliomalegria

24,229
14
73
89

That's what I was thinking originally; but @kindall mentioned this disadvantage, among others: "you have to remember to use it everywhere you use a string", including by the other users of my code. – max Feb 07 '12 at 02:09

Ethan Furman · Answer 5 · 2012-02-07T16:25:40.287

0

Instead of trying to make your strings non-iterable, switch the way you are looking at the problem: One of your parameters is either an iterable, or a ...

string
int
custom class
etc.

When you write your function, the first thing you do is validate your parameters, right?

def set_fields(record, fields, value):
    if isinstance(fields, str):
        fields = (fields, )  # tuple-ize it!
    for f in fields:
        record[f] = value

This will serve you well as you deal with other functions and parameters that can be either singular, or pluralized.

edited Feb 07 '12 at 16:25

answered Feb 07 '12 at 04:20

Ethan Furman

63,992
20
159
237

1

This is very unpythonic. Consider that you want to use a list, or any other iterator rather than a tuple? Python is a duck-typed language, it's not a good idea to type-check, it defies the ideals of the language. – Gareth Latty Feb 07 '12 at 04:28
Don't check that it is a tuple. Check that it is not a string or bytes. – Lennart Regebro Feb 07 '12 at 12:27
@LennartRegebro: Thanks -- hearing it a different way made it click for me. Answer updated. – Ethan Furman Feb 07 '12 at 16:27
1

@Lattyware: As Lennart said, my mistake was in checking for a `tuple` instead of checking that it wasn't a `str`. `isinstance` has its place, and this is one of them. Answer updated. – Ethan Furman Feb 07 '12 at 16:28

Suppressing treatment of string as iterable

5 Answers5

Linked