Functionally parsing/assembling a file

Question

I have a sql file that I am trying to parse. I can easily load the whole file, string.split(";") and properly delimit the file. However, from a challenge and learning standpoint I would like to apply some functional foo and some streams to handle a file of arbitrary size. But that, gets a lot trickier.

Essentially I need to:

Load an arbitrary buffer of data (let's say, a line since I can use Java's BufferedReader to stream those
Continue to concatentate those line until there is a delimiter ;
Emit the concatenated result
Start a new sql command after the delimiter
Final result: List<String> representing all sql commands to execute

Here is a sample:

CREATE SCHEMA HelloWorld;

USE HelloWorld;

CREATE TABLE HelloWorldTest (
  id int PRIMARY KEY NOT NULL AUTO_INCREMENT,
  message VARCHAR(32)
);

INSERT INTO HelloWorldTest (message) VALUES ('Hello world!');
INSERT INTO HelloWorldTest (message) VALUES ('this is a test!');

Inefficient Java to do the job:

 String fileData = IOUtils.toString(sqlFile);
 List<String> sqlStats = stream(fileData.split(";")).filter(s -> s.length() > 0)
        .collect(toList());

How can I do this with stream and functional programming to handle any size file? Non-java answers are welcome as Java may not be able to do the job with the native libraries

Please explain why that's inefficient. Also, instead of `s.length() > 0`, you could use `String#isEmpty` — Vince, Oct 18 '17 at 19:05
@VinceEmigh I could use `!s.isEmpty()` sure. Imagine that the SQL file is 400mb or 400gb - Loading the whole thing into memory to parse it has verticle scaling limits — Christian Bongiorno, Oct 18 '17 at 23:19
I was asking *how* is this inefficient? You obviously wouldn't be storing 400gb in memory all at once - it's just not practical. *Why* would you need all that data in memory at once? Load and parse what you need at that given time. This sounds like an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem), should explain why you need a system that can parse 400gb at once, as it may not be needed. Explain your actual problem, rather than the problem of your solution (needing to load & parse 400gb of data at once) — Vince, Oct 18 '17 at 23:59
See the end of [this answer](https://stackoverflow.com/a/46159070/2711488); to convert to a stream when it really has to be a stream, you may consider [the answer you already got](https://stackoverflow.com/a/46822464/2711488) or [this answer](https://stackoverflow.com/a/40304028/2711488)… — Holger, Oct 19 '17 at 07:50

Joshua Jones · Answer 1 · 2017-10-19T13:36:55.133

1

Use a scanner:

try (Scanner s = new Scanner(new File("/path/to/file.txt"))) {
    s.useDelimiter(";");
    Spliterator<String> splt = Spliterators.spliterator(s, Long.MAX_VALUE, Spliterator.ORDERED | Spliterator.NONNULL);
    StreamSupport.stream(splt, false).filter(x -> !x.isEmpty()).forEach(executeSql);
}

edited Oct 19 '17 at 13:36

answered Oct 19 '17 at 03:37

Joshua Jones

1,364
1
9
15

`.filter(String::isEmpty)` will *only* accept empty strings; you surely want the opposite, `.filter(str-> !str.isEmpty())`. Further, a method receiving an existing `Scanner` as argument shouldn’t deal with closing the scanner, that’s the caller’s responsibility. Besides that, it does the job, but you can simplify, e.g. just use `s.useDelimiter(";");`. Note that in Java 9, you can use `s.useDelimiter(";").tokens() .filter(str -> !str.isEmpty()).forEach(…)` to do the same. – Holger Oct 19 '17 at 08:35
@Holger you are absolutely correct about everything you said. I will correct the code. – Joshua Jones Oct 19 '17 at 11:50
@Holger not entirely what I had in mind but much better than what I was doing. Thanks! – Christian Bongiorno Oct 19 '17 at 19:27

Functionally parsing/assembling a file

1 Answers1