0

I am doing a Python scripting.

I have a string, the len() of the string is 1048576 and the sys.getsizeof() of the string is 1048597.

However, when I write this string to a file, the byte size of the file is 1051027. My code is below, anyone can tell me why the byte size of file is different with that of the string?

print type(allInOne) # allInOne is my string
print len(allInOne)
print sys.getsizeof(allInOne)
newFile = open("./all_in_one7.raw", "w")
newFile.write(allInOne.encode('ascii'))
newFile.close()

My string is allInOne, it is generated with many processes before, it was generated like this allInOne = numpy.uint8(dataset.pixel_array).tostring() , above this, dataset.pixel_array is of type numpy.ndarray. I don't know whether this info would be of any help.

martineau
  • 119,623
  • 25
  • 170
  • 301
Summer Sun
  • 947
  • 13
  • 33

1 Answers1

5

Your allInOne = numpy.uint8(dataset.pixel_array).tostring() doesn't look like text. When writing anything but text to a file in Python, you need to open the file in binary mode ("wb" instead of "w") so that Python doesn't assume the 0x0A bytes are '\n' line endings and attempt to convert them to the '\r\n' line endings that are more common on Microsoft Windows.

To see if this is your problem, count that particular character:

print len(allInOne), "bytes"
print len(allInOne) + allInOne.count('\n'), "bytes with 0A counted twice"
Damian Yerrick
  • 4,602
  • 2
  • 26
  • 64