-1

Is there in Java some sort of equivalent to BufferedReader.skip(), that would take number of lines as parameter instead of number of characters?

I want to jump to a specific line in a text file and start reading from that point without the need going thru all the lines of the file and checking against the line number (tens of thousands of them - model obj file).

All the examples I saw were dealing with the checking of line number which is not what I want.

qraqatit
  • 492
  • 4
  • 14
  • Not that I know of. Reading lines requires reading each character, so the only way to do it is to read each character from the beginning. This is why some applications use fixed length "lines," or records. Or use a database. – markspace Feb 23 '22 at 15:37
  • Does this answer your question? [BufferedReader to skip first line](https://stackoverflow.com/questions/23236000/bufferedreader-to-skip-first-line) – sinclair Feb 23 '22 at 15:40
  • @sinclair no, not at all, unfortunately – qraqatit Feb 23 '22 at 15:42
  • hm `Stream lines = bufferedReader.lines().skip(linesToSkip);` should do it – sinclair Feb 23 '22 at 15:43
  • 3
    What you failed to parse from all the question is that what you want is impossible. Without an outside source of information about the structure of the file, it's impossible to seek to a certain line number without reading all previous lines as well. That's because "skip 3 lines" simply means "skip characters until you've skipped 3 line-ending indicators". And there's no magic way to detect where those line-endings are. You have to actually read them. – Joachim Sauer Feb 23 '22 at 15:48
  • @sinclair no it does not: it still reads the whole file first to get all the lines (lines()), so I see no difference between the exact thing I wanted to avoid – qraqatit Feb 23 '22 at 15:49
  • @JoachimSauer hence my question "Is there in Java..." - that means I do not know if something like that exists (or is even possible at all - if I knew that I would not ask, right?), therefore asking, so you might be right – qraqatit Feb 23 '22 at 15:51
  • @qraqatit: small correction: It reads the file *up to that point*. Whether or not it reads the whole file depends on what else you do with that stream. And reading the file *up to that point* isn't really avoidable, as I said above. – Joachim Sauer Feb 23 '22 at 15:51
  • 2
    And now you know: you didn't fine a solution because what you want is impossible. There might be alternatives for what you're trying to do, but that depends on what the underlying problem is (for example continuing on reading a file that was previously read partially can be done). – Joachim Sauer Feb 23 '22 at 15:52
  • @JoachimSauer as I said my primary point is to avoid reading the file everytime from the begining - I am creating a procedural loader which reads different section of the obj file (different 3D geometry) with every loop. If I would first load the whole file I would need to wait several tens of seconds just for the read, but what I am doing is updating canvas with every loop with new geometry until the whole 3D object is created – qraqatit Feb 23 '22 at 15:54
  • 2
    @qraqatit: if you know (or can assume) that the file doesn't change between multiple attempts you can remember byte offsets when reading and skip immediately to those, effectively building your own index. That's not quite trivial, since if you use a `Reader` you won't have direct/unproblematic access to the underlying `InputStream`, but it's doable. But the easiest way to do this is probably to read the whole file sequentially in a background thread and update some shared datastructure whenever possible. – Joachim Sauer Feb 23 '22 at 16:26
  • @JoachimSauer sounds good indeed (cos the file itself won't be changed at all) only if I could understand enough what you just wrote so that I could get anything practical from it at all - I am afraid my Java skills are not that good to understand it without some sort of simple practical example, I guess. – qraqatit Feb 23 '22 at 16:37
  • 1
    Then I suggest you type up a new question with **specific** things you want to do and maybe even your own attempt at it. This is way too far removed from your original question to make sense to further discuss in the comments. – Joachim Sauer Feb 23 '22 at 16:39
  • @JoachimSauer ah, nevermind then... – qraqatit Feb 23 '22 at 16:40
  • 1
    So formulating the question is too much work but you expect me to provide the answer here where it won't ever be useful for anyone else. To each their own, I guess. – Joachim Sauer Feb 23 '22 at 16:42
  • @JoachimSauer it's ok, I am looking at the moment to FileInputStream.skip() that probably do something like you said above, I guess (hoping I could understand it somehow, like how to know/get the actual line bytes offset) – qraqatit Feb 23 '22 at 16:45
  • Just call `readLine()` *N* times where *N* is the 0-relative line number you want, and use the last return value. Four lines of code. You can read millions of lines per second with `BufferedReader`, so the performance issue is negligible. – user207421 Feb 23 '22 at 23:29
  • @user207421 not true, if I do that I wait like several tens of seconds to completely parse the huge obj file...please read what was the question + explanation/reasoning, we are already beyond this. – qraqatit Feb 23 '22 at 23:44

1 Answers1

0

So, the solution is to use FileInputStream.skip().

UPDATE: manually adding system-specific new line separator bytes length to line bytes length at each line iteration solved the problem of erroneous bytes skipping, so now it finally works as expected!

Define some Long variable where you will store the number of bytes to skip. I did that in my main application class (App.class):

public static long lineByteOffset = 0;

Then, in your method/function where you read your lines with BufferedReder make it like this (all my files that I read from are encoded as UTF-8):

File objFile = new File(PATH_TO_YOUR_FILE_HERE);

FileInputStream fir = null;
try {
    fir = new FileInputStream(objFile);
} catch (FileNotFoundException e) {
    System.err.println("File not found!");
}
fir.skip(App.lineByteOffset);//<--- 1ST IMPORTANT PART: SET HOW MANY BYTES TO SKIP, YOU START WITH 0 FOR THE 1ST TIME

BufferedReader reader = new BufferedReader(new InputStreamReader(fir, "UTF-8"));
int nls = System.getProperty("line.separator").getBytes().length;
String line;

try {
    while ((line = reader.readLine()) != null) {
        App.lineByteOffset += (long) (line.getBytes().length + nls);//<--- 2ND IMPORTANT PART: INCREASE NUMBER OF BYTES TO SKIP FOR NEXT TIME
        /*
        DO YOUR STUFF HERE...
        IN MY CASE IT RETURNS SPECIFIC BLOCK
        WHICH IN EFFECT EXIT THE WHILE LOOP AS NEEDED
        SO THAT THE NEXT TIME IT CONTINUE WHERE WE LEFT IT
        WITHOUT NEED TO READ THE WHOLE FILE FROM THE START ONCE AGAIN
        */
    }
    reader.close();
} catch (IOException e) {
    System.err.println("Error reading the file");
}
qraqatit
  • 492
  • 4
  • 14
  • There's a slight risk that the `BufferedReader` doesn't actually use UTF-8, because you didn't specify an encoding and it might use a different one (the platform default encoding). It's better to specify it on the creation of the `Reader` to ensure what you're reading matches your user of UTF-8 later. This also has the problem that `readLine` does not return the line termination character(s), so you're not counting those in your `lineByteOffset`. – Joachim Sauer Feb 23 '22 at 22:53
  • How are you going to *obtain* the byte-offset values? – user207421 Feb 23 '22 at 23:31
  • @user207421 It is quite clear from the example I would say, I don't know but isn't that elementary Java coding? Like adding to existing value and stuff? – qraqatit Feb 23 '22 at 23:45
  • 1
    Note that this still ignores the length of the newline separator (which is either 1 or 2 bytes), so the start of the next read will "drift backwards" slowly (i.e. you'll start reading more and more data that you've already processed the further down the file you are). Unfortunately fixing this makes the whole code a whole lot more complex (as you'd have to stop using `readLine()` and basically do what it does manually. – Joachim Sauer Feb 24 '22 at 08:40
  • @JoachimSauer unfortunately you are right, all of a sudden after a few lines it become "broken", most probably dealing with what you just said (empty lines separating individual geometries in the obj file), so I still doe snot have solution as of now, gotta update this post – qraqatit Feb 24 '22 at 15:54
  • Is literally just keeping the `Reader` open not an option for you? That would be so much simpler – Joachim Sauer Feb 24 '22 at 16:52
  • Sorry, but what exactly do you mean by that? – qraqatit Feb 24 '22 at 16:55
  • Also I think that maybe the problem is not about empty lines, but rather really something else, like that new line separator byte length. But how comes it cannot detect the right length? Or maybe that add operation I have there is not suitable for adding several long values together? I tested also adding the new long values using `Long.sum()` but the outcome is exactly the same - total erroneous bytes skipping – qraqatit Feb 24 '22 at 17:06
  • 1
    First: by just keeping the `Reader` open I mean: don't re-open the file every time you want to continue reading. Just open the `Reader` once when you first start reading and keep it open even when you're not actively reading. That will just hold on to the right position. And the problem with your code isn't empty lines (not any more than normal lines). A "full line" in a text file contains some text (or not) and is followed by a new line indicator (either LF or CR+LF). Those characters are not returned by `readLine()` because you don't usually care about them, but you still need to count them. – Joachim Sauer Feb 25 '22 at 00:57
  • @JoachimSauer right: as you can see by my updated code above from yesterday evening, I added OS specific newline character bytes length to the long variable and all is working now as expected - thank you for all your hints, they helped a lot. + I cannot let BufferedReader opened between calls because it is called from outside (from another class) with each new geometry...anyway there is no need for that anymore either, tho maybe it could shorten those read times even further, so maybe in the future I could rewrite the my own code so that it would consider your suggestion. – qraqatit Feb 25 '22 at 10:32
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/242396/discussion-between-joachim-sauer-and-qraqatit). – Joachim Sauer Feb 25 '22 at 11:27