10

Since fseek() does not work on pipes what methods exist for simulating seeking forward? The naive approach is to use fread() and throw away the contents read into the memory buffer. For huge seeks to avoid huge buffers you would use the same buffer over and over with the final read using just a part of the buffer.

But is this the only approach? Is there another way which avoids the buffer and the potential multiple read?

hippietrail
  • 15,848
  • 18
  • 99
  • 158

2 Answers2

6

Seeking doesn't make sense on pipes because the input is produced dynamically (not stored on disk). The lseek kernel system call is not implemented for pipes.

Also have in mind that a pipe is essentially a producer-consumer buffer of a limited, fixed size. When it gets full, the producer is suspended until the consumer reads the oldest data.

Blagovest Buyukliev
  • 42,498
  • 14
  • 94
  • 130
  • @hippietrail: if there are concerns about a buffer and multiple `read()` calls to skip the data, perhaps it is better to not use a pipe at all. Have the source write to a disk file, then the sink end of the pipe can use `lseek()` family calls. – wallyk Apr 27 '11 at 15:28
  • Of course but sometimes the dynamically produced output is in a known format. – hippietrail Apr 27 '11 at 15:29
  • @wallyk: Some reasons I have used pipes in the past include processing XML from huge compressed archives, and processing XML on the fly as it is arriving over the internet. Sometimes what you are looking for requires only a portion of the entire data, sometimes you don't have the disk space to have all such archives lying around uncompressed. – hippietrail Apr 27 '11 at 15:32
  • 1
    @hippietrail: here is an attempt to implement seekable pipes in Linux that you might find interesting: http://lkml.indiana.edu/hypermail/linux/kernel/0411.3/0739.html – Blagovest Buyukliev Apr 27 '11 at 15:33
  • @Blagovest Buyukliev: alas, that thread concludes without a solution. – wallyk Apr 27 '11 at 18:12
  • @hippietrail: Then the only reasonable solution is to re-implement the source end of the pipe yourself in a way that is useful for your purposes. – wallyk Apr 27 '11 at 18:14
5

Yes, it is the only way. I would use a buffer somewhere around 1k-8k. With much smaller the syscall overhead for read will come into play, and with much larger you'll evict useful data from the cache.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711