2

Everyone's done this--from the shell, you need some details about a text file (more than just ls -l gives you), in particular, that file's line count, so:

@ > wc -l iris.txt
 149 iris.txt

i know that i can access shell utilities from python, but i am looking for a python built-in, if there is one.

The crux of my question is getting this information without opening the file (hence my reference to the unix utility *wc -*l)

(is 'sniffing' the correct term for this--i.e., peeking at a file w/o opening it?')

doug
  • 69,080
  • 24
  • 165
  • 199
  • 5
    `wc` does open the file. The only information about the content of a file you can obtain without opening it is its length. – Kevin Reid Mar 24 '12 at 21:50
  • Good question. Maybe a duplicate of: http://stackoverflow.com/questions/845058/how-to-get-line-count-cheaply-in-python I won't vote to close it, though, in case someone comes up with a more novel approach. – mechanical_meat Mar 24 '12 at 21:50
  • 1
    @Kevin Reid: completely depends on the file system. Often you can get size, file name, permissions, edit/create date, etc. – orlp Mar 24 '12 at 22:01
  • @bernie--thanks, didn't see that Q--agreed it is close to mine, but isn't directed to getting this info *without* opening the file, which is really the crux of my question. – doug Mar 24 '12 at 22:07

2 Answers2

5

You can always scan through it quickly, right?

lc = sum(1 for l in open('iris.txt'))
Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
  • Oh, sorry about missing the _'w/o opening it'_ part. – Lev Levitsky Mar 24 '12 at 21:50
  • 2
    No `[]` needed. And there is _no way_ to do it without opening it. This is a good solution. – agf Mar 24 '12 at 21:51
  • This is a nice looking solution but would be better without the square brackets. A generator expression is more memory friendly than a list comprehension. – Raymond Hettinger Mar 24 '12 at 21:53
  • 1
    Actually, I wouldn't mind a brief explanation about this. A generator expression is only valid with `()`. So I understand why I can change `[]` to `()`, but omit it altogether? – Lev Levitsky Mar 24 '12 at 22:00
  • 1
    @LevLevitsky: look no further than this other SO question: http://stackoverflow.com/questions/9297653/python-generator-expression-parentheses-oddity – mechanical_meat Mar 24 '12 at 22:06
  • Great, thanks @bernie. I can't believe I was looking at that exact part of the docs: **"The parentheses can be omitted on calls with only one argument."** Time to get some sleep, I guess. – Lev Levitsky Mar 24 '12 at 22:10
  • Actually, I'd like a brief explanation of this. Apparently the default `open()` reads lines? I could not find this in the docs. – MarkHu Jul 24 '17 at 21:21
  • @MarkHu it depends on the _mode_ argument. Indeed, by default the I/O is done in "text mode", which means exactly that the file is read/written line by line. This also implies decoding/encoding. Otherwise, you need to open the files in binary mode explicitly (`'rb'` or `'wb'`). – Lev Levitsky Jul 24 '17 at 21:53
2

No, I would not call this "sniffing". Sniffing typically refers to looking at data as it passes through, like Ethernet packet capture.

You cannot get the number of lines from a file without opening it. This is because the number of lines in the file is actually the number of newline characters ("\n" on linux) in the file, which you have to read after open()ing it.

Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • That is true about Ethernet packet sniffing. It seems that sniffing is also used in the file-IO context (at least WRT Python's `csv` module): http://docs.python.org/library/csv.html#csv.Sniffer – mechanical_meat Mar 24 '12 at 21:52