25

I'm not exactly new to Python, but I do still have trouble understanding what makes something "Pythonic" (and the converse).

So forgive me if this is a stupid question, but why can't I get the size of a file by doing a len(file)?

file.__len__ is not even implemented, so it's not like it's needed for something else? Would it be confusing/inconsistent for some reason if it was implemented to return the file size?

Dr. Kickass
  • 587
  • 1
  • 5
  • 10
  • 2
    (1) In python interactive interpreter execute `import this`. (2) because to implement that you would need to read the file to it's end. So, you'd better ask OS to do that for you (e.g. like in [this SO question](http://stackoverflow.com/questions/6591931/getting-file-size-in-python)) – J0HN May 31 '13 at 20:26
  • because someone came up with os.stat and statinfo.st_size – varun May 31 '13 at 20:27

6 Answers6

25

Files have a broader definition, especially in Unix, than you may be thinking. What is the length of a printer, for example? Or a CDROM drive? Both are files in /dev, and sort of in Windows.

For what we normally think of as a file, what would its length be? The size of the variable? The size of the file in bytes? The latter makes more sense, but then it gets ickier. Should the size of the file's contents be listed, or its size on disk (modulus allocation unit size). The question arises again for sparse files (files that have large empty sections which take no space, but are part of the file's normally reported size, supported by some file systems like NTFS and XFS).

Of course, the answer to all of those could be, "just pick one and document what you picked." Perhaps that is exactly what should be done, but to be Pythonic, something usually must be clear-cut without having to read a lot of docs. len(string) is mostly obvious (one may ask if bytes or characters are the return value), len(array) is obvious, len(file) maybe not quite enough.

Charles Burns
  • 10,310
  • 7
  • 64
  • 81
  • Worth noting that in Python 3, the strong distinction between `str` (a sequence of code points) and `bytes` (a sequence of bytes) compared to the `unicode`/`str` distinction in Python 2 makes it clearer how `__len__` should be defined for each. – chepner May 31 '13 at 20:48
  • 1
    Thanks, this is a great answer. I just gave the solution to @gnibbler because he was the first to point out the technical reason why `__len__` wouldn't work well for a file. – Dr. Kickass May 31 '13 at 21:06
24

file is an iterator. To find the number of lines you need to read the entire file

sum(1 for line in file)

if you want the number of bytes in a file, use os.stat

eg

import os
os.stat(filename).st_size
Joel Mellon
  • 3,672
  • 1
  • 26
  • 25
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • 3
    OK, the iterator thing makes sense. I guess to implement `__len__` for a file it would have to read the file into memory and then do a len() on the buffer. Probably not a great idea. So you can ask the OS, which already knows the file size, hence os.stat. Thanks! – Dr. Kickass May 31 '13 at 20:34
9

file returns an iterator, so you can't use len() on it.

To get the size of a file you can use os.stat:

>>> foo = os.stat("abc")
>>> foo.st_size
193L

If by size you mean number of line then try these:

len(open("abc").readlines())

or

sum (1 for _ in open("abc"))

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
5

So forgive me if this is a stupid question, but why can't I get the size of a file by doing a len(file)?

Charles Burns' answer makes a good point about Unix's "everything is a file" philosophy, and, although you always can use os.fstat() to get the 'size' for any file descriptor, with something like...

import os

f = open(anything)
size = os.fstat(f.fileno()).st_size

...it may not return anything meaningful or useful...

>>> os.fstat(sys.stdout.fileno()).st_size
0
>>> fd1, fd2 = os.pipe()
>>> os.fstat(fd1).st_size
0

I think the reason is that a Python file object, or file-like object, is supposed to represent a stream, and streams don't inherently have a length, especially if they're write-only, like sys.stdout.

Usually, the only thing you can guarantee about a Python file-like object is that it will support at least one of read() or write(), and that's about it.

Aya
  • 39,884
  • 6
  • 55
  • 55
3

A simple way to measure the number of characters would be:

file = open('file.bin', 'r')
# Seek to the end. (0 bytes relative to the end)
file.seek(0, 2)
length = file.tell()
gepoch
  • 711
  • 6
  • 17
  • Possibly undefined behavior for binary files: https://wiki.sei.cmu.edu/confluence/display/c/FIO19-C.+Do+not+use+fseek%28%29+and+ftell%28%29+to+compute+the+size+of+a+regular+file – Translunar Aug 17 '18 at 20:35
2

I would say because finding the length depends on OS specific functionality. You can find the length of a file with this code:

import os os.path.getsize('C:\\file.txt')

You could also read the entire file into a string and find the length of the string. However you would want to be sure that the file is not of a huge size that will eat up all your memory.

wardd
  • 592
  • 3
  • 8