4

I am trying to write a specific number of bytes of a string to a file. In C, this would be trivial: since each character is 1 byte, I would simply write however many characters from the string I want.

In Python, however, since apparently each character/string is an object, they are of varying sizes, and I have not been able to find how to slice the string at byte-level specificity.

Things I have tried:

Bytearray: (For $, read >>>, which messes up the formatting.)

$ barray = bytearray('a')
$ import sys
$ sys.getsizeof(barray[0])
24

So turning a character into a bytearray doesn't turn it into an array of bytes as I expected and it's not clear to me how to isolate individual bytes.

Slicing byte objects as described here:

$ value = b'a'
$ sys.getsizeof(value[:1])
34 

Again, a size of 34 is clearly not 1 byte.

memoryview:

$ value = b'a'  
$ mv = memoryview(value)  
$ sys.getsizeof(mv[0])  
34  
$ sys.getsizeof(mv[0][0])  
34  

ord():

$ n = ord('a')  
$ sys.getsizeof(n)  
24  
$ sys.getsizeof(n[0])  

Traceback (most recent call last):  
  File "<pyshell#29>", line 1, in <module>  
    sys.getsizeof(n[0])  
TypeError: 'int' object has no attribute '__getitem__'  

So how can I slice a string into a particular number of bytes? I don't care if slicing the string actually leads to individual characters being preserved or anything as with C; it just has to be the same each time.

Community
  • 1
  • 1
user124384
  • 400
  • 1
  • 9
  • 22

1 Answers1

3

Make sure the string is encoded into a byte array (this is the default behaviour in Python 2.7).

And then just slice the string object and write the result to file.

In [26]: s = '一二三四'

In [27]: len(s)
Out[27]: 12

In [28]: with open('test', 'wb') as f:
   ....:     f.write(s[:2])
   ....:

In [29]: !ls -lh test
-rw-r--r--  1 satoru  wheel     2B Aug 24 08:41 test
satoru
  • 31,822
  • 31
  • 91
  • 141
  • Wow. I did not realize write() and splitting a string like that could do that. What threw me off was that sys.getsizeof(s[:1]) was returning 34. Thank you! – user124384 Aug 24 '15 at 02:30
  • 1
    @user124384 What `sys.getsizeof` tells you is the size of the `String` object, not that of the underlying byte array. – satoru Aug 24 '15 at 02:41
  • I see. How is it possible to get the size of the underlying byte array? As I showed in my examples, just using `getsizeof()` on a bytearray doesn't seem to do it. – user124384 Aug 24 '15 at 02:58
  • 1
    @user124384 Since string is just a byte array in Python 2, why not just use `len` – satoru Aug 24 '15 at 03:00
  • Ahh. I knew `len()` returned the number of characters, but I didn't realize a character was just one byte as in C. Thank you again! – user124384 Aug 24 '15 at 03:09
  • @user124384 Practically it doesn't return the number of characters, as you can see in my example above. – satoru Aug 24 '15 at 05:42