0

I have following problem: I have large txt file in which I have to find particular data.

What is the best way to divide file to allow to look in it by different threads?

It should be done by counting new line marks and setting after which new line particular thread should look for values?

Any hint would be priceless.

BR/T

Tomasz Kaleta
  • 61
  • 1
  • 6
  • 1
    Memory map the file and pass separate address ranges to each thread. Worry about the edge case where the data you are looking for crosses an address boundary. Probably not worth doing any threading at all as you will be IO bound not CPU bound; that is reading the file will be the limiting step and multiple threads reading the same file will be slower due to seeking. – Richard Critten Jan 07 '18 at 12:23
  • You should find hint here: https://stackoverflow.com/questions/34751873/how-to-read-huge-file-in-c – Asesh Jan 07 '18 at 12:24

1 Answers1

0

You could try to map the file (or large enough portions of the file) in memory with one thread (which only does the reading - as in repeatedly calling getline() or something) and then use multiple threads to read from the memory location you allocated for the file. How you split that is up to you - if you are looking for a particular character, equal splitting should do the job. If you are searching for a substring, then you still split equally, but then search the vicinity of the split index (from index - size(substring) to index + size(substring)) with one thread you have to choose. I am by no means an expert, but I would think that a thread doing only reading would be much faster than the disk. I am waiting for comments to prove me right/wrong on this. Cheers.

CCC
  • 1
  • 1