1

I want to write a Stream to file. However, the Stream is big (few Gb when write to file) so I want to use parallel. At the end of process, I would like to write to file (I am using FileWriter)

I would like to ask if that has potential cause any problem in file.

Here is some code

function to write stream to file

public static void writeStreamToFile(Stream<String> ss, String fileURI) {
    try (FileWriter wr = new FileWriter(fileURI)) {
        ss.forEach(line -> {
            try {
                if (line != null) {
                    wr.write(line + "\n");
                }
            } catch (Exception ex) {
                System.err.println("error when write file");
            }
        });
    } catch (IOException ex) {
        Logger.getLogger(OaStreamer.class.getName()).log(Level.SEVERE, null, ex);
    }
}

how I use my stream

Stream<String> ss = Files.lines(path).parallel()
.map(x->dosomething(x))
.map(x->dosomethingagain(x))

writeStreamToFile(ss, "path/to/output.csv") 
GhostCat
  • 137,827
  • 25
  • 176
  • 248
Haha TTpro
  • 5,137
  • 6
  • 45
  • 71
  • You will get random order in file. Also there is no need to have different function. You can call forEach after .map only. – NumeroUno Aug 23 '18 at 04:11
  • order is not a problem. I wonder if there are broken format per line or so ? – Haha TTpro Aug 23 '18 at 04:20
  • There should not be a problem with a particular line. – NumeroUno Aug 23 '18 at 04:21
  • 2
    There is no benefit to having multi-threaded input or output. If you want to parallelize processing, have one thread reading data and placing it in queue, the processing threads taking items from the queue and placing result items into an output queue and then a single thread that takes elements from the output queue and writes them to the file. – Jim Garrison Aug 23 '18 at 05:30

3 Answers3

2

Yes It is Ok to use FileWriter as you are using, I have some another ways which may be helpful to you.

As you are dealing with large files, FileChannel can be faster than standard IO. The following code write String to a file using FileChannel:

@Test
public void givenWritingToFile_whenUsingFileChannel_thenCorrect() 
  throws IOException {
    RandomAccessFile stream = new RandomAccessFile(fileName, "rw");
    FileChannel channel = stream.getChannel();
    String value = "Hello";
    byte[] strBytes = value.getBytes();
    ByteBuffer buffer = ByteBuffer.allocate(strBytes.length);
    buffer.put(strBytes);
    buffer.flip();
    channel.write(buffer);
    stream.close();
    channel.close();

    // verify
    RandomAccessFile reader = new RandomAccessFile(fileName, "r");
    assertEquals(value, reader.readLine());
    reader.close();
}

Reference : https://www.baeldung.com/java-write-to-file

You can use Files.write with stream operations as below which converts the Stream to the Iterable:

Files.write(Paths.get(filepath), (Iterable<String>)yourstream::iterator);

For example:

Files.write(Paths.get("/dir1/dir2/file.txt"),
     (Iterable<String>)IntStream.range(0, 1000).mapToObj(String::valueOf)::iterator);

If you have stream of some custom objects, you can always add the .map(Object::toString) step to apply the toString() method.

Heading

Darshan Dalwadi
  • 1,022
  • 7
  • 11
  • For such a simple operation, there won’t be any performance difference between `FileOutputStream` and `FileChannel`. Further, starting with Java 7, you can open a `FileChannel` directly via `FileChannel.open(…)`, no need to go through the `RandomAccessFile`. But when you want to write text, you should stay with `Reader` and `Writer`, calling `String.getBytes` is only reasonable for small strings, it doesn’t scale. You can open them directly via `Files.newBufferedReader`/`Files.newBufferedReader` and they may be backed by a `Channel`, there is no need to make a distinction here. – Holger Aug 23 '18 at 09:37
  • Also, when you have an existing `byte[]` array, you can simply get a buffer via `ByteBuffer.wrap(strBytes)`; there is no need to allocate a new buffer and copy the array to it. And mind the existence of [the try-with-resources Statement](https://docs.oracle.com/javase/8/docs/technotes/guides/language/try-with-resources.html), so your example can be rewritten to a simple `try(FileChannel c = FileChannel.open(Paths.get(fileName), CREATE, WRITE)) { c.write(ByteBuffer.wrap("Hello".getBytes())); }`. – Holger Aug 23 '18 at 09:42
1

As others have mentioned, this approach should work, however you should question if it is the best method. Writing to a file is a shared operation between threads meaning you are introducing thread contention.

While it is easy to think that having multiple threads will speed up performance, in the case of I/O operations the opposite is true. Remember I/O operations are finitely bounded, so more threads will not increase performance. In fact, this I/O contention will slow down access to the shared resource because of the constant locking/unlocking of the ability to write to the resource.

The bottom line is that only one thread can write to a file at a time, so parallelizing write operations is counterproductive.

Consider using multiple threads to handle your CPU intensive tasks, and then having all threads post to a queue/buffer. A single thread can then pull from the queue and write to your file. This solution (and more detail) was suggested in this answer.

Checkout this article for more info on thread contention and locks.

ninge
  • 1,592
  • 1
  • 20
  • 40
0

It is not a problem in case it is okay for the file to have the lines in random order. You are reading content in parallel, not in sequence. Therefore you have no guarantees at which point any line is coming in for processing.

That is only thing to keep in mind here.

GhostCat
  • 137,827
  • 25
  • 176
  • 248