1

I have a file that consist about several millions lines. I need to read it multithreaded as fast as it possible. And which line I need to be sent via http request. May be I should split that file on others smaller files and read. I need some ideas.

Haygo
  • 125
  • 2
  • 12
  • You don't need to read it multithreaded. You need to read it. Multithreading is a possible solution to the performance constraint, not part of the functional requirement. You can read millions of lines a second Java with a BufferedReader: that should be fast enough already. If it isn't, you need to state why. And if you're writing to a network, that's the rate-limiting step. Not reading. – user207421 Sep 08 '14 at 09:54
  • possible duplicate of [How I can use multithreading concept in java to read from multiple files?](http://stackoverflow.com/questions/25714178/how-i-can-use-multithreading-concept-in-java-to-read-from-multiple-files) – Raedwald Sep 08 '14 at 12:02

4 Answers4

2

You could use the FileStream.Read method to read a block of text and add it to another result string in a new Thread.

thijmen321
  • 473
  • 1
  • 3
  • 13
  • Correct me if I understood it wrong. For example, if I execute X Threads I should split my file on X block and then each Thread would be handle it's own block of lines and send http request line by line, Is it right? – Haygo Sep 08 '14 at 09:54
  • 1
    Yes that's what I meant – thijmen321 Sep 08 '14 at 12:15
1

You don't need to read it from multiple threads because the bottle-neck will be the network bandwidth and not the reading speed of your disk.

Here is an efficient one-liner solution:

Files.copy(Paths.get("/path/to/file.txt"), response.getOutputStream());
icza
  • 389,944
  • 63
  • 907
  • 827
0

Less band width on sending Most http servers can deliver GZIP compressed files to a browser if that browser has that capability set. Almost a oneliner is support of GZIPOutputStream to send. (Conditional, depending on the header.)

Memory mapped file: You can use a RandomAccessFile and get the MemoryMappedByteBuffer of the channel. Then split in blocks, by reading the first \n after some position to find the exact split position.

In general this kind of parallel reading in a file is not optimal for the hardware / system software. So you won't escape taking timings of different solutions.

Input and output in parallel: In fact I would use one thread for reading and another for writing, so they are decoupled. Check which thread has to wait more, and improve that side. I bet it's the network.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
-1

read x lines at a time into a list of line VO and then send that to an executor to process. thats the best you can do. Tweak the size of executor threads and the number of lines to be read in one go base don what works for you.

Nazgul
  • 1,892
  • 1
  • 11
  • 15
  • why is this a -1 here. Spring batch uses this approach to function. Whats wrong with the approach i suggested here? random downgrades without reason are big turn offs. – Nazgul Sep 16 '14 at 13:01