9

I wanted to pad a string with null characters ("\x00"). I know lots of ways to do this, so please do not answer with alternatives. What I want to know is: Why does Python's string.format() function not allow padding with nulls?

Test cases:

>>> "{0:\x01<10}".format("bbb")
'bbb\x01\x01\x01\x01\x01\x01\x01'

This shows that hex-escaped characters work in general.

>>> "{0:\x00<10}".format("bbb")
'bbb       '

But "\x00" gets turned into a space ("\x20").

>>> "{0:{1}<10}".format("bbb","\x00")
'bbb       '
>>> "{0:{1}<10}".format("bbb",chr(0))
'bbb       '

Even trying a couple other ways of doing it.

>>> "bbb" + "\x00" * 7
'bbb\x00\x00\x00\x00\x00\x00\x00'

This works, but doesn't use string.format

>>> spaces = "{0: <10}".format("bbb")
>>> nulls  = "{0:\x00<10}".format("bbb")
>>> spaces == nulls
True

Python is clearly substituting spaces (chr(0x20)) instead of nulls (chr(0x00)).

bonsaiviking
  • 5,825
  • 1
  • 20
  • 35
  • 4
    Please leave a comment when you downvote so I can improve this question. I have done my research and know about `ljust` and other ways of accomplishing the task. I want to know why python 2.7 behaves this way. – bonsaiviking May 24 '13 at 18:40
  • Use `print "bbb" + "\x00" * 7` and you'll get a string with 7 spaces. Shell always print "\x00" as a space character. Without print shell returns the `repr` version of the string. – Ashwini Chaudhary May 25 '13 at 05:28

3 Answers3

4

Digging into the source code for Python 2.7, I found that the issue is in this section from ./Objects/stringlib/formatter.h, lines 718-722 (in version 2.7.3):

/* Write into that space. First the padding. */
p = fill_padding(STRINGLIB_STR(result), len,
                 format->fill_char=='\0'?' ':format->fill_char,
                 lpad, rpad);

The trouble is that a zero/null character ('\0') is being used as a default when no padding character is specified. This is to enable this behavior:

>>> "{0:<10}".format("foo")
'foo       '

It may be possible to set format->fill_char = ' '; as the default in parse_internal_render_format_spec() at ./Objects/stringlib/formatter.h:186, but there's some bit about backwards compatibility that checks for '\0' later on. In any case, my curiosity is satisfied. I will accept someone else's answer if it has more history or a better explanation for why than this.

bonsaiviking
  • 5,825
  • 1
  • 20
  • 35
2

The answer to the original question is that it was a bug in python.

It was documented as being permitted, but wasn't. It was fixed in 2014. For python 2, the fix first appeared in either 2.7.7 or 2.7.8 (I'm not sure how to tell which)

Original tracked issue.

aniski
  • 1,263
  • 1
  • 16
  • 31
sbrodie
  • 21
  • 4
0

Because the string.format method in Python2.7 is a back port from Python3 string.format. Python2.7 unicode is the Python 3 string, where the Python2.7 string is the Python3 bytes. A string is the wrong type to express binary data in Python3. You would use bytes which has no format method. So really you should be asking why is the format method on string at all in 2.7 when it should have really only been on the unicode type since that is what became the string in Python3.

Which I guess that answer is that it is too convenient to have it there.

As a related matter why there is not format on bytes yet

cmd
  • 5,754
  • 16
  • 30
  • This yields the same result: `u"{0:\x00<10}".format(u"bbb")`. Source code shows that unicode and string types use the same formatter. – bonsaiviking May 24 '13 at 19:47
  • @bonsaiviking yes, the point is that `.format` is not for binary data and shouldn't be use for binary data. Trying to use the unicode method for binary data isn't going to work well. – cmd May 24 '13 at 20:37