Read data faster from files

Question

I have question about seek in files.

I have pcap file and I need to seek for a specific packet. so far, this is my code for finding that packet:

while (!find_the_packet) 
 {
   pcap_next_ex(p_pcap, &header, &data); //read the next packet
   check_if_the_packet_found();
 }

and it is working great.

my goal is finding that packet faster- not checking packet by packet until I find it.

so I built data base- Hash Map with (key,value). lets say that the

key   -> No. of the packet
value -> the packet itself (or the location of the packet)

I also noticed the pcapnav library function: pcapnav_goto_offset(pcapnav_t *pn, off_t offset, pcapnav_cmp_t boundary)

and I saw that this function uses FSEEK. so my data base is not very helpful because the FSEEK works serially (correct me if I wrong).

so my question-

is the FSEEK really works serially? read chunk by chunk? how does it works? I am bit confused..

if so, is there faster way to get specific packet\chunk of data from pcap file?

thanks in advanced.

Yes you're wrong about [`fseek`](http://en.cppreference.com/w/c/io/fseek), all it does is set the position where the next read/write should happen, basically it's just a variable assignment. — Some programmer dude, May 11 '14 at 08:39
Although it may depend on the underlying file system, fseek (and the following read) is understood to work in nearly constant time on modern implementations. — Marian, May 11 '14 at 08:41
Hi, thanks! this is my confusion- suppose the position is set to specific point inside the file, lets say 300MB from the current position. is the cursor will get to the next read\write position immediately? — user3378689, May 11 '14 at 09:05
In my very simplified understanding a sequential read of 512 bytes requires 1 disk operation and fseek to 300MB offset plus read of 512 bytes requires around 6 disk operations. Of course, the actual time is also influenced by the actual state of I/O caching. — Marian, May 11 '14 at 09:12

score 2 · Answer 1 · edited May 23 '17 at 12:22

2

Fseek only tells the underlying library (libc) where the next read should happen. Libc will then forward the request to the operating system (typically using the lseek system call). So, in order to read at a given position you have two system calls (lseek, read) and one copy (which the read does from the filesystem buffers (aka cache) to the address space of your program).

If the file you're going to read is smaller than the size of the available RAM and will be mostly cached then you'll benefit from mmap-ing it. In that case you can also ask the operating system to lazily prefetch the file (using madvise or PrefetchVirtualMemory). If the file is larger than the available RAM and/or accessed sporadically then the read speed will be limited by the disk I/O, making the difference between seek+read and mmap insubstantional.

edited May 23 '17 at 12:22

Community

1
1

answered May 11 '14 at 13:06

ArtemGr

11,684
3
52
85

thank you for your helpful answer! let me see if I understand you- if the available RAM is 512B and I want to seek to 5GB, is the operating system will "take" 5GB\512B chunks (chunk by chunk), do nothing with them, and only then- when it gets to the (5GB\512B) chunk perform read? – user3378689 May 12 '14 at 07:58
and my only chance to make it faster is to load the content of the file to the program's memory, arrange it to MAP (key,value) and when I wat to perform seek, I will do that vary fast (MAP time)? – user3378689 May 12 '14 at 08:01
@user3378689 No, the filesystem won't scan the file sequentially, it has a map telling it where exactly on the disk is 5gb (cf. http://en.wikipedia.org/wiki/Inode_pointer_structure). And no, you can't improve the speed by loading the file into the RAM if the file is larger than the available RAM. – ArtemGr May 12 '14 at 08:56

Read data faster from files

1 Answers1