3

This question might look silly but, I have a /tmp/size.txt with this content:

hello

and os.path.getsize('/tmp/size.txt') says 6 but when I do:

sys.getsizeof(b'hello')
# OR
sys.getsizeof(bytes(bytearray('hello')))
# OR
sys.getsizeof(bytes('hello'))

it returns 42.

What is the difference between the os.path.getsize and sys.getsizeof?

Afshin Mehrabani
  • 33,262
  • 29
  • 136
  • 201
  • 1
    `getsizeo()` return the size of the object queries in memory including the structural overhead. It is often larger than the data stored in that object. In your case `len()` should be more helpful. – Klaus D. Dec 02 '16 at 11:51

2 Answers2

7

The two are not compatible in python. os.path.getsize give the size of a file, whereas sys.getsizeof gives the size of an object.

The file is 6 bytes, not 5, because of a line-ending (on Windows it might be 7 bytes). If you were using C then "hello" would be 6 bytes because a binary zero '\0' marks the end of the string. If you were using another language then it too would have its own red-tape memory overhead.

The memory occupied by the data is (generally) less than that occupied by an object. An object will include other information about the data, like its size and location. It is a price you pay for using a high-level language.

cdarke
  • 42,728
  • 8
  • 80
  • 84
  • and how can I find the file size of an object in the memory? say a bytearray – Afshin Mehrabani Dec 02 '16 at 11:55
  • Size of an object of size of its data? Not the same thing! `sys.getsizeof` gives the size of an object, including its data. The size of the data is academic, and implementation dependant. Why do you need it? – cdarke Dec 02 '16 at 11:57
  • well, let's say I have `bytes(bytearray('hi'))` and I want to determine the file size of this bytearray if I write it on disk. The fact is that, I have the object in memory but I don't want to write it to get the actual file size on disk. – Afshin Mehrabani Dec 02 '16 at 11:58
  • Are you going to serialise the data, i.e. you want your file to have 'hi' in it, or are you going to pickle the object? – cdarke Dec 02 '16 at 12:01
  • no, I don't serialize the data. say a `size.txt` with `hi` in it – Afshin Mehrabani Dec 02 '16 at 12:03
  • OK, then use `len(variable-name)`. But don't forget that unless you open the file for binary access, line endings might be added (depending on how you write to the file). – cdarke Dec 02 '16 at 12:03
  • i see, this is what I was thinking as a solution. and one more question, does this solution support different encodings, e.g. utf8? – Afshin Mehrabani Dec 02 '16 at 12:04
  • You can't store multi-byte characters that easily in a bytearray. `len()` on a *string* give the number of *characters*, on a bytes or bystearray object it is the number of bytes. – cdarke Dec 02 '16 at 12:08
  • This is possibly a duplicate of http://stackoverflow.com/questions/4967580/how-to-get-the-size-of-a-string-in-python – cdarke Dec 02 '16 at 12:11
  • See also http://stackoverflow.com/questions/6714826/how-can-i-determine-the-byte-length-of-a-utf-8-encoded-string-in-python – cdarke Dec 02 '16 at 12:12
0

os.path.getsize returns the file size in bytes.

sys.getsizeof returns the amount of bytes needed to store the str/bytes object in memory. (which has an overhead over the actual content, due to structure data).

Dean Fenster
  • 2,345
  • 1
  • 18
  • 27
  • alright, and how can I find the file size of an object in the memory? say a bytearray – Afshin Mehrabani Dec 02 '16 at 11:53
  • `sys.getsizeof` will return the amount of memory the object takes up. – Dean Fenster Dec 02 '16 at 11:53
  • "file size of an object in memory" does not really make sense to ask for. This depends on how you store the object in memory and how you store it in the file, which need not be the same. You could ask for the size of the data stored in the object - if that can be measured, there might be an answer there... – kratenko Dec 02 '16 at 12:00