-2

I am working on a project where I will have a binary file. The file is split into multiple sections, each of which represents a list of primitive values. I need a solution where I can have a collection of objects, each of which represents a section of the file. These collections are then all held within a "file" object that represents the file as a whole.

Each collections object will need to provide sequential access to each value in the represented section of the file. What method would provide the fastest data retrieval without loading all the data into memory first?

Also it would be nice if two separate collections of the same "file" object could be accessed by two separate Threads, but this is not as important.

Troy Stopera
  • 302
  • 3
  • 15
  • Depends upon the charset you have as your system default... Please provide that too... – CoderNeji Jun 05 '15 at 05:50
  • 1
    @CoderNeji He said it was a binary file. – Kayaman Jun 05 '15 at 05:50
  • How large is "quite large"? You may well find that striving for efficiency costs you huge amounts of effort in code and maintenance that isn't really worth it in the end. – Jon Skeet Jun 05 '15 at 05:52
  • It is just a file that starts with a table that provides the index of each section. Then each sections contains numeric values (`shorts` or `ints`) depending on the file. – Troy Stopera Jun 05 '15 at 05:52
  • So you're essentially trying to create a sort of virtual memory (since you don't want to read all the data in the memory). `FileChannel` and `RandomAccessFile` would work, `InputStream` not so well. Also flagging to close the question. – Kayaman Jun 05 '15 at 05:54
  • @Kayaman what is wrong with this question? I'm looking for different ideas and implementations. – Troy Stopera Jun 05 '15 at 05:55
  • Is the file a structured binary file... Like kind of having headers and all... if yes refer to this.... http://stackoverflow.com/questions/277944/best-way-to-read-structured-binary-files-with-java – CoderNeji Jun 05 '15 at 05:57
  • @TroyStopera SO favours straightforward well structured questions that have definitive answers (hence the close type of "Opinion based"). Brainstorming and idea swapping does not [fit well](http://stackoverflow.com/help/on-topic) here. – Kayaman Jun 05 '15 at 06:01
  • @CoderNeji My issue isn't really how to read the files, it's more how to have multiple objects that each represent a section of the file. – Troy Stopera Jun 05 '15 at 06:02
  • With this you can access any large file... http://www.codeproject.com/Questions/543821/ReadplusBytesplusfromplusLargeplusBinaryplusfilepl... Let me see for what you desire.. Wait... – CoderNeji Jun 05 '15 at 06:04
  • @Kayaman I guess I just don't understand the rules. I feel that my question is "a practical, answerable problem that is unique to software development." Let me improve my question... – Troy Stopera Jun 05 '15 at 06:07
  • You can do this... Read the file as a whole first... Then split it according to size and then assign object for each of the split... Its the only way of doing it – CoderNeji Jun 05 '15 at 06:10
  • @CoderNeji Why is that the only way? I could keep a pointer to a point in the file in each collection object and then use that to access the data. I re-phrased my question it may provide more detail. – Troy Stopera Jun 05 '15 at 06:13
  • 1
    @TroyStopera Most questions that are of the format "What's the best..." "Should I do this or that" etc. are closed. You had 3 potential solutions (though I couldn't understand `Inputstream` being there), so your question was basically a "could you decide for me" and the threading question is an entirely separate issue. However, to avoid stringing this out any longer, `FileChannel` would be the most modern and fastest way (since it can memory map parts instantaneously). – Kayaman Jun 05 '15 at 06:14
  • @Kayaman In no way was I asking for people to decide and if it came across that way I apologize. I listed those because I had thought of those myself but I now know that I should keep my previous possible solutions to myself when requesting help from the community. The Threading part was not a question, it was another requirement. And `InputStream` was there because you can read a file with an `InputStream`. Again, sorry for the poor question, and thanks for your help. – Troy Stopera Jun 05 '15 at 06:30
  • Separate binary file data access from the file and section objects - two layers. I'd start with memory mapped i/o (nio's MappedByteBuffer), which is very likely to serve well for your use case. – laune Jun 05 '15 at 06:31
  • 1
    @Kayaman Now where should questions like this one be asked? I'm beginning to lose interest in SO fast, because it's questions like this one I find *really* interesting - not "Why does this cause NPE?" and similar. – laune Jun 05 '15 at 06:34
  • @laune Thank You! I looked at MappedByteBuffer. I am curious though, does MappedByteBuffer load all the data into memory? I would imagine it does since it is a buffer. – Troy Stopera Jun 05 '15 at 06:34
  • Ah! Excellent question! - An OS typically has a feature of mapping a file memory paga by page into the process address space. So, only these parts are read into memory that really are accessed. - I've used this on huge files using Java 7 or 8 on Linux, and it worked quite well. – laune Jun 05 '15 at 06:38
  • Care to join me in [chat?](http://chat.stackoverflow.com/rooms/139/java) – Kayaman Jun 05 '15 at 06:39
  • @laune Thanks! I was trying to find a definitive answer to whether it stored it all or how exactly that worked. This is the exact kind of answer I was looking for. – Troy Stopera Jun 05 '15 at 06:44
  • @TroyStopera Do note that [FileChannel.map()](http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#map%28java.nio.channels.FileChannel.MapMode,%20long,%20long%29) returns a `MappedByteBuffer`. – Kayaman Jun 05 '15 at 06:49
  • @Kayaman Yes I knew that. That's why in my original question I said FileChannel/MappedByteBuffer. I'm not trying to say you didn't answer my question, I'm saying that laune answered it in a more respectful way. This is only my 4th or 5th question posted on SO and there is no need to criticize my question when all-in-all it is a good question, it just doesn't meet the strict standards of SO. You could have helped me re-phrase my question instead. It makes me hesitant to post here since SO has such harsh Moderation. – Troy Stopera Jun 05 '15 at 07:00
  • @TroyStopera I had a discussion with laune in the chat (I was hoping you would've joined as well) about this post, posts on SO/programming in general, post quality and other such things. I asked laune to wrap this question up in an answer containing the info in the good comments to improve this question (lots of comments and no answers does not a good question make) and I retracted my close vote. Moderation may seem harsh, but we try our best to keep the quality up. – Kayaman Jun 05 '15 at 07:51
  • @Kayaman I agree that an answer was needed! I did not join the chat because my previous experience with moderators in chat settings has always been negative. I apologize. As for "keeping the quality up", I just feel that moderators are to quick to flag as opposed to provide constructive feedback to make the question good "quality". But I will end this now as it is becoming too Meta and I wouldn't want to get flagged ;) – Troy Stopera Jun 05 '15 at 08:06

1 Answers1

2

A good approach is to divide the solution into layers, here: one for the file i/o, mapping bytes to Java shorts and ints, another one for the abstraction of the file sections and the entire file.

java.nio's MappedByteBuffer provides a good interface between the "byte array" of a random access file and what you need for getting the Java typed data from that.

As Kayaman has mentioned, FileChannel.map() returns a MappedByteBuffer and you can navigate easily on that with its methods.

The implemention should make use of the OS feature for mapping memory pages to file pages, actually accessing on the file only what you really access in memory. (I've used this recently with Java 8 and Linux, and it performed well on files exceeding even the capacity of a single MappedByteBuffer.)

laune
  • 31,114
  • 3
  • 29
  • 42