Does the Python interpreter automatically recognise a UTF-8 BOM?

Question

This question is asking why specifying the python interpreter before the path to a python script makes the script run even when it is UTF-8 BOM encoded.

My python script /home/osmc/python/test3.py is executable and has the shebang #!/usr/bin/python at the start of the script. I run the script in OSMC which is a linux OS based on Debian. It uses the /bin/bash shell.

I understand when you encode a python script using UTF-8 (WITH BOM) the Byte Order Mark (BOM) is a signature inserted into the start of the script and this prevents the linux system from correctly reading the shebang line. The result is it doesn't know what interpreter to use to run the script so it won't.

Hence if the script is encoded in UTF-8 (WITH BOM) it won't run from the command line if run like this:

# /home/osmc/python/test3.py

However it will run like this:

# /usr/bin/python /home/osmc/python/test3.py

My question:

Is this because the python interpreter automatically detects the byte order mark at the start of the python script and ignores it? Is the python interpreter able to detect how the python script is encoded and thus knows to ignore the BOM?

Run it explictily via `python /home/osmc/python/test3.py` to find out, though my guess is that with Python 2.7 even this may not work; can't you switch to Python 3? The way you invoke it, Python doesn't even get the chance to see your program. — user1934428, Oct 12 '20 at 11:28
I did run it with /usr/bin/python /home/osmc/python/test3.py and it worked. My question is why? — FlexMcMurphy, Oct 12 '20 at 11:57
Can you clarify what you are struggling with exactly? The shebang and consequently the restriction on the first two bytes is a Linux/UNIX specific thing. The Python Interpreter is a) not Linux/UNIX and does not attempt to interpret the first two bytes as a shebang at all and b) a Python Interpreter so it does not have to (or try to) find a python interpreter. — MisterMiyagi, Oct 12 '20 at 12:12
I clarified my question above to explain what I am struggling with. The linked answer above and your comment have not answered it. Can you reopen the question and let others contribute please? — FlexMcMurphy, Oct 12 '20 at 12:30
I did reopen, but someone else felt it was still too similar. However, it is still not clear to me what information you are missing – the Python interpreter *does not care* about the shebang, because it is purely a LInux/UNIX thing (as the linked duplicate says). There is no conflict between BOM and shebang. — MisterMiyagi, Oct 12 '20 at 12:37
Thanks for your comment which explains the behavior to me a lot more. However the information in the linked question above "Shebang executable not found because of UTF-8 BOM (Byte Order Mark)" does not answer my question. What you just commented, about Python ignoring the shebang, is not included as an answer to that other question. So I feel it was incorrect and unjust to close this question. It should be re-opened and then I could accept your comment as an answer since it made me understand why adding /usr/bin/python before the path to the script makes it run when it is UTF-8 BOM encoded. — FlexMcMurphy, Oct 12 '20 at 12:52
Furthermore, I clarified my question above by asking "Is the python interpreter able to detect how the python script is encoded" This is a reasonable query given the title of my question. However this is also not answered in the above linked question which was used to justify closing my question. My question should be re-opened. — FlexMcMurphy, Oct 12 '20 at 12:57
I'm still not sure what your question actually is, so I cannot properly answer it even if it were re-opened. The Shebang is purely a Linux/UNIX thing (*as the other Q&A explains*). It is not clear why you assume *Python* should care about the shebang, and thus why it would affect Python's handling of UTF-8 BOM. The two question explicitly asked seem to be answered by the description itself, which shows that, yes, Python can properly run programs with an UTF-8 BOM. — MisterMiyagi, Oct 12 '20 at 13:09
@FlexMcMurphy: Well, if you executed it by explicitly specifying `python ....` it simply tells you that this Python version lives happily with a BOM, which already answers your question. The linked answer dealt with the problem, that you would run your script by just entering the path to the script. After all, you claimed that you would use the shebang line, and for THIS use case, the linked answer is indeed correct. — user1934428, Oct 12 '20 at 14:37
The description above clearly shows three parts to my question. The linked to answer only helped me understand why supplying just the full path to a python script won't work if it starts with a BOM. It did not fully answer my question which is why it is unjust that this was closed. Fortunately @MisterMiyagi helped me answer the second part of my question.. that supplying the full path to the python interpreter runs the script because it doesn't care about the shebang.. this was not mentioned in the linked to answer. Part three: Is the python interpreter able to detect the script encoding? — FlexMcMurphy, Oct 12 '20 at 17:29

Does the Python interpreter automatically recognise a UTF-8 BOM?

0 Answers0