13

In python 3.5, is it possible to predict when we will get an interned string or when we will get a copy? After reading a few Stack Overflow answers on this issue I've found this one the most helpful but still not comprehensive. Than I looked at Python docs, but the interning is not guaranteed by default

Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

So, my question is about inner intern() conditions, i.e. decision-making (whether to intern string literal or not): why the same piece of code works on one system and not on another one and what rules did author of the answer on mentioned topic mean when saying

the rules for when this happens are quite convoluted

wim
  • 338,267
  • 99
  • 616
  • 750
  • Just use `==` and forget about it. It's implementation detail anyway. – wim Mar 04 '16 at 20:51
  • 3
    @erip I believe OP is aware of that. After getting through the boilerplate, this question seems to be asking about the interning rules. – timgeb Mar 04 '16 at 20:53
  • 1
    If you really want to know the differences in implementation, it would probably make sense to specify the Python versions installed on both systems. – Lev Levitsky Mar 04 '16 at 20:53
  • 4
    @wim I don't want to forget, I want learn and understand. –  Mar 04 '16 at 20:54
  • @LevLevitsky Thanks for editing the question for it to be more relevant. –  Mar 04 '16 at 21:03
  • Then could you clarify your question and remove all the irrelevant preamble about `==`? Is your question "when will a string will be interned in cpython?" Note that this is no longer a python question, because python the language may not even have string interning. – wim Mar 04 '16 at 21:05
  • 1
    @wim I'd love to, but my experience with Python is not very high, so you're welcome to edit the question by yourself as you see it –  Mar 04 '16 at 21:10
  • OK, I will edit it. But I'm not sure exactly what your question is, because it's rambling a bit. Are you asking "when will a string will be interned in cpython?" *note:* You should add your specific version because there are many builds of python3 – wim Mar 04 '16 at 21:12
  • @wim Yes, when will a string be interned in cpython –  Mar 04 '16 at 21:17
  • 1
    The only rule is that the return value of `intern` is interned. Everything else is a morass of implementation details, inconsistent because there's little point to being consistent. – user2357112 Mar 04 '16 at 21:18
  • I've edited the content to discourage those kind of useless answers this question was attracting (the ones which don't tell you anything you don't already know). If you don't think it's an improvement, feel free to rollback. – wim Mar 04 '16 at 22:33
  • @wim Thanks for refactoring, I appreciate your help –  Mar 04 '16 at 22:34

2 Answers2

9

You think there are rules?

The only rule for interning is that the return value of intern is interned. Everything else is up to the whims of whoever decided some piece of code should or shouldn't do interning. For example, "left" gets interned by PyCodeNew:

/* Intern selected string constants */
for (i = PyTuple_GET_SIZE(consts); --i >= 0; ) {
    PyObject *v = PyTuple_GetItem(consts, i);
    if (!all_name_chars(v))
        continue;
    PyUnicode_InternInPlace(&PyTuple_GET_ITEM(consts, i));
}

The "rule" here is that a string object in the co_consts of a Python code object gets interned if it consists purely of ASCII characters that are legal in a Python identifier. "left" gets interned, but "as,df" wouldn't be, and "1234" would be interned even though an identifier can't start with a digit. While identifiers can contain non-ASCII characters, such characters are still rejected by this check. Actual identifiers don't ever pass through this code; they get unconditionally interned a few lines up, ASCII or not. This code is subject to change, and there's plenty of other code that does interning or interning-like things.

Asking us for the "rules" for string interning is like asking a meteorologist what the rules are for whether it rains on your wedding. We can tell you quite a lot about how it works, but it won't be much use to you, and you'll always get surprises.

user2357112
  • 260,549
  • 28
  • 431
  • 505
-4

From what I understood from the post you linked:

When you use if a == b, you are checking if the value of a is the value of b, whereas when you use if a is b, you are checking if a and b are the same object (or share the same spot in the memory).

Now python interns the constant strings (defined by "blabla"). So:

>>> a = "abcdef"
>>> a is "abcdef"
True

But when you do:

>>> a = "".join([chr(i) for i in range(ord('a'), ord('g'))])
>>> a
'abcdef'
>>> a is "abcdef"
False

In the C programming language, using a string with "" will make it a const char *. I think this is what is happening here.

Rolbrok
  • 308
  • 1
  • 7