2

Is there a way to figure out where in a file a program is reading from? It seems like might be doable with strace or dtrace?

To clarify the question and give motivation, say I have a 10GB log file and am counting the number of unique lines:

$ cat log.txt | sort | uniq | wc -l

Can I check where in the file cat is currently at, effectively giving the progress of the command? Using lsof, I can't seem to get the offset of last file read, which I think is what would do the trick:

$ lsof log.txt
COMMAND   PID USER   FD   TYPE DEVICE    SIZE/OFF       NODE NAME
cat     16021 erik    3r   REG   0,22 13416118210 1078133219 

Edit: I apologize, the example I gave is too narrow and misses the point. Ideally, for an arbitrary program, I would like to see where in the file reads are occurring (regardless of pipe).

Community
  • 1
  • 1
erikreed
  • 1,447
  • 1
  • 16
  • 21
  • So you're wanting to snoop I/O of a 3rd party process? I'm not seeing a reason for wanting to monitor the progress of cat. Is there a more real-world explanation of what you're really after? – Randy Howard Mar 27 '13 at 21:22
  • This is my real world explanation. I've been running this command for about 30 minutes now and it would be nice to see the progress it has made through the file. Another example: a user is downloading a large file from a web server, how can I check where the last read was to determine the progress of the download? – erikreed Mar 27 '13 at 21:24

3 Answers3

3

You can do what you want with the progress command. It shows the progress of coreutils tools such as cat or other programs in reading their file.

File and offset information is available in Linux in /proc/<PID>/fd and /proc/<PID>/fdinfo.

JanKanis
  • 6,346
  • 5
  • 38
  • 42
2

Instead of cat :

pv log.txt | sort | uniq | wc -l

Piping with pv :

SIZE=$( ls -l log.txt | awk '{print $5}'); cat log.txt | sort | pv -s $SIZE | uniq | wc -l
kjprice
  • 1,316
  • 10
  • 26
  • Awesome, that's a nice one when piping things. But what if piping isn't feasible? e.g. another user is running a process, or the program doesn't read stdin. Edit: also this requires installing pv, which might not be ideal in all circumstances. – erikreed Mar 27 '13 at 21:17
1

If the example is truly your use case, then I'd recommend pipe viewer. example image from pv's website

Brian Cain
  • 14,403
  • 3
  • 50
  • 88