3

I have just relised that I have a file where only one line exists with a long string. This file (line) can be300MB heavy. I would like stream some data from this string and save in another file i.e the line from the file would look like:

String line = "{{[Metadata{"this, is my first, string"}]},{[Metadata{"this, is my second, string"}]},...,{[Metadata{"this, is my 5846 string"}]}}"

Now I would like to take 100 items from this string from one "Metadata" to another "Metadata", save it in the file and continue with the rest of the data. So in the nutshell from one line I would like to get N files with i.e. 100 Metadata strings each

BufferedReader reader = new BufferedReader(new StringReader(line));

This is what I've got and I don't know what I can do with the reader.

Probably

reader.read(????)

but I don't know what to put inside :( Can you please help

EdXX
  • 872
  • 1
  • 14
  • 32
  • @TimBiegelseien That “duplicate” isn’t useful here, since the entire file is on one line. – MTCoster Dec 20 '18 at 15:25
  • @TimBiegeleisen OPs question isn't a duplicate of the answer you linked. You were a bit too hasty here. – xtratic Dec 20 '18 at 15:25
  • 2
    The syntax seems proprietary (i.e. not markup or JSON), so you'll likely need your own parser to separate items. Not going to be a trivial feature. You could also work around by using regex and splitting on `},{`, but that'd be a lot less "solid" (and much uglier) in the end. – Mena Dec 20 '18 at 15:25
  • Note that `BufferedReader` features a `readLine` method, but that's useless to you since the whole contents are in one line, while you're expecting to split them by outer `{}`-enclosed elements. – Mena Dec 20 '18 at 15:27
  • The first question I'd ask myself is "what constitutes an element" in your string, e.g. is it the whole `{}`-enclosed content, or `Metadata{..}`, or the contents of the inner `{}`, etc. etc. Then you can use stack-like structures to parse, returning a `Stream` of POJOs, which you can in turn collect after a given count (e.g. `100` in your case). – Mena Dec 20 '18 at 15:29
  • @EdXX Does this format have a formal specification? Can we assume it's just `{...},{...},{...}` from the root of the document? – xtratic Dec 20 '18 at 15:29
  • @TimBiegeleisen agree with the dupe, but the target question is not very useful tbh. It's a very bad "historical" question that should have been closed back then instead of gathering upvotes, and it definitely doesn't address the parsing requirements here. – Mena Dec 20 '18 at 15:31
  • @xtratic hopefully. if that's the case, he can just split it into an array, and then create his n files with a for loop. – HamBone41801 Dec 20 '18 at 15:36
  • @HamBone41801 but since there's commas within the `{...}` content `String.split(",")` wont be enough. It needs actual parsing. – xtratic Dec 20 '18 at 15:37
  • @xtratic yea, I thought of that right after I commented. he could always remove the extra `{}`, making the code `String.split("},{")`. He would still be left with `[Metadata{"..."}]` in each file. – HamBone41801 Dec 20 '18 at 15:40
  • @xtratic In my real example I have JSON array with objects and they starts like {"Metadata":{something}, "Event":{something}}, ..., {"Metadata":{something}, "Event":{something}} There are a lot of comas and such likes inside {} – EdXX Dec 20 '18 at 15:44
  • @HamBone41801 If I would like to split to the array by "coma" for example then the whole line needs to be taken to the memory at once I think. If yes then It's not a solution for me while it can be 300MB :( – EdXX Dec 20 '18 at 15:48
  • Perhaps use [this](https://stackoverflow.com/questions/30832101/buffered-reader-read-text-until-character) to read chunks at a time and build an object when enough data has been read, then use [this](https://stackoverflow.com/questions/30685623/how-to-implement-a-java-stream) to stream it. – Andrew S Dec 20 '18 at 15:50
  • @EdXX so just to clarify, you want your line split up into groups of 100 `{"Metadata":{something}, "Event":{something}}`'s? if that is the case, what I suggested would still work if you nested a second for loop on the inside of the first and set it to run 100 times. you would also need to create a variable to keep track of your current position in the array now that you've added a for loop, but that's not difficult. – HamBone41801 Dec 20 '18 at 15:54
  • I think we need to be clear about the format first. OP, what is the exact format, really? Is it JSON? Why would you give an example that's different from your real data? `In my real example I have JSON array with objects and they starts like {"Metadata":{something}, "Event":{something}}, ..., {"Metadata":{something}, "Event":{something}}` Your example in your question was not the same as what you are saying now. – xtratic Dec 20 '18 at 15:57
  • @EdXX Is your real file standard JSON or not? If JSON, what exactly is the problem, fear of running out of memory with a 300 meg file? If so, why not just use a streaming/incremental JSON parser? How has this not been covered already on Stack Overflow? – Basil Bourque Dec 20 '18 at 23:50

0 Answers0