1

I have a function which will receive a Stream<String>. This stream represents the lines in a file (as called by Files.lines(somePath)). The file itself is actually the concatenation of many files into a single file, something like this:

__HEADER__ # for file 1
data
more data
...
__HEADER__ # file 2 starts here
some more data...
...

I need to convert the stream into multiple physical files on the filesystem.

I've tried the simple approach, something along the lines of:

String allLinesJoined = lineStream.collect(Collectors.joining());
// This solution seems to get stuck on the line above ^
String files[] = allLinesJoined.split("__HEADER__");
for (fileStr : files)
{
    // This function will write each fileStr to a separate file
    // (filename is determined by contents of fileStr)
    writeToPhysicalFile(fileStr);
}

But the input file is about ~300 MB (and could get larger) and this solution seems to get stuck on the first line. Maybe it would complete if I had more memory...?

Is there a better way to do this, if my starting point is a Stream<String>, or should I start making other changes so that this bit of code can just read through the file line by line, without using the streaming API?

(the order of the lines does matter, in the context of these files)

tl;dr

I need to turn one big file represented as Stream<String> in to many little files. Each little file begins with __HEADER__ and all lines after, until the next __HEADER__. The current library uses streams to provide the file, but is it even worth trying to do this with streams, or will my life be easier if I change the library to offer non-stream functionality?

FrustratedWithFormsDesigner
  • 26,726
  • 31
  • 139
  • 202

1 Answers1

2

That kills the whole idea of streams.

Try forEach():

    Stream<String> lineStream = Files.lines(Paths.get("your_file"));

    lineStream.forEachOrdered((s) -> {
        if ("HEADER".equals(s)) {
            // create new file
        }
        else {
            // append to this file
        }
    });
Alexey Soshin
  • 16,718
  • 2
  • 31
  • 40
  • Yes, it doesn't work well with the idea of streams, but I was able to get something working very nicely based on this, and it was faster than re-writing the underlying code, and it performs very nicely as well. :) – FrustratedWithFormsDesigner Sep 21 '16 at 18:52
  • 1
    Not that I criticize you personally, by any means! If each knew all the solutions, there wouldn't be StackOverflow to start with. – Alexey Soshin Sep 22 '16 at 15:10