75

Is there a way to find the size of a file object that is currently open?

Specifically, I am working with the tarfile module to create tarfiles, but I don't want my tarfile to exceed a certain size. As far as I know, tarfile objects are file-like objects, so I imagine a generic solution would work.

kmario23
  • 57,311
  • 13
  • 161
  • 150
strider1551
  • 763
  • 1
  • 5
  • 6

5 Answers5

127
$ ls -la chardet-1.0.1.tgz
-rwxr-xr-x 1 vinko vinko 179218 2008-10-20 17:49 chardet-1.0.1.tgz
$ python
Python 2.5.1 (r251:54863, Jul 31 2008, 22:53:39)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('chardet-1.0.1.tgz','rb')
>>> f.seek(0, os.SEEK_END)
>>> f.tell()
179218L

Adding ChrisJY's idea to the example

>>> import os
>>> os.fstat(f.fileno()).st_size
179218L
>>>        

Note: Based on the comments, f.seek(0, os.SEEK_END) is must before calling f.tell(), without which it would return a size of 0. The reason is that f.seek(0, os.SEEK_END) moves the file object's position to the end of the file.

Freek de Bruijn
  • 3,552
  • 2
  • 22
  • 28
Vinko Vrsalovic
  • 330,807
  • 53
  • 334
  • 373
  • http://docs.python.org/library/stat.html#stat.ST_SIZE `os.fstat` return `stat` structure, please use `st_size` – shevski Oct 11 '11 at 10:00
  • 2
    Can someone shed some light on the magic of `f.seek(0,2)`? Why `tell()` returns 0 without it? – previous_developer Aug 31 '15 at 11:45
  • 9
    @m_poorUser `f.seek(0, 2)` moves the file object's position to 0 bytes from the end of the file, so the file object's position is at the end of the file. Then, `f.tell()` returns the current file object's position, which is the size of the file in this case. See https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects – EarlCrapstone Sep 23 '15 at 13:14
  • 1
    `f.seek(...)` returns the absolute position. No need to follow with `f.tell()`. Try this: `print(f.seek(0, 2))` and you will see. – IAbstract Mar 04 '16 at 00:04
  • 3
    @IAbstract - that's new in Python3. In Python2 `f.seek` returns nothing, regardless of which arguments you pass to it. As such, the `f.tell()` should be kept as it's needed! – hjc1710 Mar 22 '16 at 21:23
  • 2
    In Python 3.6, while `BufferedIO` and `RawIO` you may use .tell() to estimate file size, by definition it returns the current stream position as an opaque number. And that number does not usually represent a number of bytes in the underlying binary storage for TextIO. FYI. – Devy Jan 09 '17 at 16:36
  • 10
    The example would be more clear if `f.seek(0, 2)` was written as `f.seek(0, os.SEEK_END)`. – Juuso Ohtonen Sep 03 '18 at 05:37
  • `f.seek(0, os.SEEK_END); file_size = f.tell()` is good. `f.seek(...)` does not return anything. https://docs.python.org/2/library/stdtypes.html#file.seek – Jerry101 Nov 07 '18 at 20:07
  • 2
    You don't need `tell()` because `seek()` already returns the position it has been set to. – Bachsau Aug 30 '19 at 14:20
  • Write `f.seek(0,0)` after `file_size = f.seek(0,2)` if you plan to use the file later. – F. Vosnim Jul 05 '20 at 22:03
  • Also dont forgot to set f.seek() to zero , otherwise you will not get any data. f.seek(0, os.SEEK_END); file_size = f.tell(); f.seek(0) – Cornea Valentin Jul 28 '22 at 08:05
13

Well, if the file object support the tell method, you can do:

current_size = f.tell()

That will tell you were it is currently writing. If you write in a sequential way this will be the size of the file.

Otherwise, you can use the file system capabilities, i.e. os.fstat as suggested by others.

PierreBdR
  • 42,120
  • 10
  • 46
  • 62
  • 3
    `current_size` is a bad variable name since it means *current size* of the file. `tell()` gives the current position of the file stream - that is, where the next read/write will occur. – IAbstract Mar 03 '16 at 14:37
  • 3
    According to the Python 3.6 doc, `.tell() Return the current stream position as an opaque number. The number does not usually represent a number of bytes in the underlying binary storage.` – Devy Jan 09 '17 at 16:26
  • 2
    @Devy [only if the file is opened in text mode](https://docs.python.org/3.6/tutorial/inputoutput.html?highlight=tell#methods-of-file-objects). – ebk Mar 06 '20 at 04:05
7

If you have the file descriptor, you can use fstat to find out the size, if any. A more generic solution is to seek to the end of the file, and read its location there.

C. K. Young
  • 219,335
  • 46
  • 382
  • 435
3

I was curious about the performance implications of both, since once you open a file, the name attribute of the handle gives you the filename (so you can call os.stat on it).

Here's a function for the seek/tell method:

import io
def seek_size(f):
    pos = f.tell()
    f.seek(0, io.SEEK_END)
    size = f.tell()
    f.seek(pos) # back to where we were
    return size

With a 65 MiB file on an SSD, Windows 10, this is some 6.5x faster than calling os.stat(f.name)

darda
  • 3,597
  • 6
  • 36
  • 49
2

Another solution is using StringIO "if you are doing in-memory operations".

with open(file_path, 'rb') as x:
    body = StringIO()
    body.write(x.read())
    body.seek(0, 0)

Now body behaves like a file object with various attributes like body.read().

body.len gives the file size.

vestronge
  • 21
  • 1