How can multi-thread is used to read the file effectively [read by chunk]? OR Any method to improve the read speed.
I've been trying out the actual results and it's a good thing to multithread, unlike my previous advice here. The un-threaded variant runs in 1m44,711s, the 4-thread one (on 4 cores) runs in 0m31,559s and the 8-thread one (on 4 cores + HT) runs in 0m23,435s. Major improvement then - almost a factor of 5 in speedup.
So, how do you split up the workload? Split it into N chunks (n == thread count) and have each thread except for the first seek to the first non-word character first. This is the start of their logical chunk. Their logical chunk ends at their end boundary, rounded up to the first non-word character after that point.
Process these blocks in parallel, sync them all to one thread, and then make that thread do the merge of results.
Best next thing you can do to improve speed of reading is to ensure you don't copy data when possible. Read through a memory-mapped file and find strings by keeping pointers or indices to the start and end, instead of accumulating bytes.
Is there any better data structure other than map can be employed to find the output effectively?
Well, because I don't think you'll be using the order, unordered_map is a better choice. I would also make it an unordered_map<std::string_view, size_t>
- string_view copies it even less than string would.
On profiling I find that 53% of time is spent in finding the exact bucket that holds a given word.