1

I am trying to read a huge file which contains a word(different length) per line. I want to read it with multi-threading depends on the string length.

For example, thread one reads lines which has one length word, thread two reads two lengths and ...

Is there any way to achieve this? If it is, how will be affected the performance?

I found this examples, but I can't put together.

Reference 1 : Multithread file reading

Reference 2 : How to read files in multithreaded mode?

Community
  • 1
  • 1
  • 1
    No, it would be like in your reference #2. One thread will read the file, and if the processing is complex, you might pass the lines to different threads for processing. Performance may or may not improve. – Kayaman Feb 01 '17 at 20:33

2 Answers2

5

You can use multiple threads, however it won't be any faster. To find all the lines of a given length you have to read all the other lines.

Is there any way to achieve this?

Read all the lines and ignore the ones you filter out.

What you can do is to process different lines in different threads however it depends on how CPU intensive this is as to whether it helps or is slower.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • I am trying to compare words, are they anagram of each other and ı was thinking that categorizing words while reading will help to be faster. However, as you mentioned, reading all the lines to find lengths is an obstacle. – user1060251 Feb 01 '17 at 20:59
  • I guess, I have to focus reading file fragmentally like in reference#2. Do you have any suggestions to speed up? – user1060251 Feb 01 '17 at 21:02
  • 1
    @user1060251 if you want to check if many words are an anagram, sort the letters, and index them in a Map of sorted letters to all the words which are an anagram. This will give you O(n) time complexity. – Peter Lawrey Feb 01 '17 at 21:13
  • My program is doing exactly same thing with single thread O(n) for loop and nlog(n) for sort.However, let say 10 billion lines how would I scale this? – user1060251 Feb 02 '17 at 10:07
  • @user1060251 *However, let say 10 billion lines how would I scale this?* Buy a faster storage system. – Andrew Henle Feb 02 '17 at 10:57
  • @user1060251 in that case it gets more complex but you can break a single file in multiple sections but it adds a lot of complexity. It at least an order of magnitude harder than using multiple threads which for many senior developers is already a challenging problem. *much* easier to start with 10 billion split across N files and merge the results at the end. – Peter Lawrey Feb 02 '17 at 11:27
2

Reading a file in multithreading mode can only make things slower, since disk drive has to move heads between multiple points of reading. Instead, transfer computational work from the reading thread to worker thread(s).

Alexei Kaigorodov
  • 13,189
  • 1
  • 21
  • 38