How does one add string prefixes to variables in python?

Question

The term 'string prefix' is explained here.

What if you have a string that has been assigned to a variable already, how do you add the string prefix to that (without using the same string)? This can be assigned a new variable, or reassigned to the same one.

"String Encoding declarations" is not an actual term, and it would be a terrible term if it was, as it has nothing to do with encoding. Some rando just edited their own made-up terminology into that answer. — user2357112, Jul 08 '20 at 19:53
@user2357112 supports Monica people make up new terms all the time, the way language works is if you understand what they are saying. Do you understand what they are talking about? — , Jul 08 '20 at 19:56
If it didn't literally appear in the sentence "'Letters before strings here are called "String Encoding declarations".", I would not know what they were talking about. I would have guessed it was a misleading term for a [PEP 263 encoding comment](https://www.python.org/dev/peps/pep-0263/), which is a completely different thing. There is only one google hit for `python "string encoding declaration"` using the "term" in this way, and it's by the guy who made the edit. — user2357112, Jul 08 '20 at 20:03
@user2357112 supports Monica I would think it's something that needs a name, does 'prefix' or 'string prefix' work at least? — , Jul 08 '20 at 20:11
You can't ping arbitrary users like that on Stack Overflow. jdi didn't receive your attempted ping. jdi wasn't the person who inserted that term into the answer anyway - [edit history](https://stackoverflow.com/posts/11279428/revisions) shows it was [this guy](https://stackoverflow.com/users/445131/eric-leschinski), and judging by their answer [here](https://stackoverflow.com/questions/12937172/what-does-u-before-a-string-in-python-mean), they misread the docs for encoding comments. — user2357112, Jul 08 '20 at 20:16
@user2357112 supports Monica I don't have the rep to comment there. Didn't jdi accept the edit though? So prefix then? Since that term was used before? — , Jul 08 '20 at 20:21
jdi has been barely active for years, and had no activity for a window of time over a month long surrounding the edit. They were probably not paying attention. — user2357112, Jul 08 '20 at 20:23
@user2357112 supports Monica ok thanks for looking into that. I changed the wording, but can be changed again if still confusing. — , Jul 08 '20 at 20:24
I think this is a debate about the terminology. I would suggest it right now. — Stathis Alexopoulos, Jul 08 '20 at 20:26
@Stathis Alexopoulos I'm not debating, I'll accept whatever it is, if it has to be written out in explanation each time, ok, but i think terms are preferable. — , Jul 08 '20 at 20:31
What does this have to do with Python 3? [The `u` prefix only exists to accommodate legacy code now](https://stackoverflow.com/a/2464968/4518341), and it doesn't actually do anything. Are you asking about different prefixes? They all work differently. See [Transform string to f-string](https://stackoverflow.com/q/44757222/4518341), [Convert regular Python string to raw string](https://stackoverflow.com/q/4415259/4518341), [Convert String to Byte](https://stackoverflow.com/a/40235958/4518341) — wjandrea, Jul 08 '20 at 20:53
@wjandrea so for the answer one would have to make a function for all the cases, depending on the letter selected, that's what it sounds like. — , Jul 08 '20 at 21:03
@dsfgh A function? I guess you could write one, but I wouldn't. The different prefixes are fundamentally different things. For example f-strings don't correspond 1-to-1 with `str.format` since they can contain expressions, and [those expressions are not restricted, so naively evaluating them would be a security risk](https://stackoverflow.com/a/47599254/4518341). I'll see if I can write you an answer to explain the others. — wjandrea, Jul 08 '20 at 21:16
@wjandrea cool maybe I'll make the function myself, any useful info helps. — , Jul 08 '20 at 21:33
@dsfgh Not to burst your bubble, but the only reason to write that function would be as a learning exercise. If you *need* a function that does that, you're probably doing something wrong. — wjandrea, Jul 09 '20 at 00:07
BTW forgot to mention, welcome to SO! Check out the [tour], and [ask] if you want advice. Also, one of your edits chopped a sentence in half: "According to how the term is used here," ... — wjandrea, Jul 09 '20 at 00:09

score 2 · Answer 1 · answered Jul 08 '20 at 20:36

2

You can't retroactively add or remove a string literal prefix. Once its been made, it's just a str (or bytes with a b prefix). If you need to convert something that was a bytes literal to str or vice-versa, you use the bytes.decode or str.encode method respectively, like you would on any bytes or str, regardless of whether it began as a literal or not, because there is no difference between literal and non-literal strings immediately after the literal is evaluated.

answered Jul 08 '20 at 20:36

ShadowRanger

143,180
12
188
271

Pedantic note I didn't want in the answer itself: Technically, `str` literals might be interned where non-literal `str`s rarely are, but this is not relevant to use unless you're incorrectly using `is` for string comparisons. – ShadowRanger Jul 08 '20 at 20:39
So if you have sg = 'stng' how can you get sgl = u'stng' from sg (without using 'stng') – Jul 08 '20 at 20:42
1

@dsfgh: If you're on Python 3, they're both the same string. `u` is an optional, meaningless prefix on Python 3 for aid in porting Python 2 scripts. It just says "this string can contain Unicode", but on Python 3, that's how all `str` are already. – ShadowRanger Jul 08 '20 at 20:43
Ok an example with another string prefix then. – Jul 08 '20 at 21:03
1

@dsfgh: Did you see my first sentence? It *cannot* be done in the general case. You can change *certain* `str` to look like `bytes` literals (`'stng'.encode('latin-1')`) but that assumes they contain only latin-1 characters; if they contain non-latin-1 characters, there is no single equivalent byte. Literals are *literals*; the literal itself possesses or lacks a prefix, you can't retroactively change it. – ShadowRanger Jul 08 '20 at 21:11
if errors occur there could be a notification that it cannot be done. I think the answer would be a function where the prefix can be selected. – Jul 08 '20 at 21:21
But yeah saying there is no general way is a useful answer to point out also. – Jul 08 '20 at 21:30
2

@dsfgh: Trying to write such a prefix selecting function is difficult to the point where, given the limited utility of such a function and the inability to cover all cases, the "benefits" are not worth the trouble of writing. The whole idea is nonsensical to start with, since, as I said, the prefixes only have meaning in terms of literals; stuff like `r` and `f` prefixes are literally impossible to graft on after the fact with anything but the loosest heuristics, and `b` is almost as bad. – ShadowRanger Jul 08 '20 at 22:21
2

@dsfgh: You're trying to unscramble a scrambled egg and boil it instead. – user2357112 Jul 08 '20 at 22:37
@user2357112 supports Monica that makes it sound like it's hard to do – Jul 08 '20 at 22:52

wjandrea · Answer 2 · 2020-07-09T00:20:13.407

In general, you can't. String prefixes are part of syntax, not data. In other words, they don't create a different type of string, but create a string in a different way.

u does nothing in Python 3. It only exists for compatibility with Python 2.
f can be emulated with str.format() for simple cases, but to fully emulate an f-string, you'd have to evaluate it, but that's a security risk since f-strings can contain arbitrary code.
r can be emulated with str.encode('unicode_escape').decode() in some cases, but not all, for example, this string literal is lossy:
```
>>> r'\x61'
'\\x61'
>>> s = '\x61'
>>> s
'a'
>>> s.encode('unicode_escape').decode()
'a'
```

b is an exception in that it actually does create a different type: a bytes object. It can be emulated with the raw_unicode_escape encoding, though I don't have any experience using it so I'm not sure if it's the same:

>>> b'a\x89\u2013'
b'a\x89\\u2013'
>>> 'a\x89\u2013'
'a\x89–'
>>> 'a\x89\u2013'.encode('raw_unicode_escape')
b'a\x89\\u2013'
>>> 'a\x89\u2013'.encode('raw_unicode_escape').decode('raw_unicode_escape')
'a\x89–'

Also just for reference, the grammar calls them stringprefix, and just "prefix" in the text.

score 0 · Answer 3 · answered Jul 08 '20 at 20:45

According to Python 2 manual

Unicode Literals in Python Source Code

In Python source code, Unicode literals are written as strings prefixed with the ‘u’ or ‘U’ character: u'abcdefghijk'. Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.

But in Python 3

The String Type

Since Python 3.0, the language’s str type contains Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode.

The default encoding for Python source code is UTF-8, so you can simply include a Unicode character in a string literal:

As far as it concerns the already created variables, either by user input or by reading a file or whatever, you have to read on each method how to manipulate unicodes

How does one add string prefixes to variables in python?

3 Answers3