How to handle CR and LF in the proper way in Java (android)

Question

So I have noticed that the bytes CR (13) and LF (10) are not fully being respected in Java. When there is a CR byte it doesn't just return the carriage but it also creates a new line. Which is weird cause CR literally stands for Carriage Return and LF stand for Line Feed, thus two seperate things. Anyways, I have accepted this part. Which means I have to write my own algorhythm to implement the support for real CR and LF actions (see this post for details about CR & LF).

Basically I have a terminal that is connected to a bluetooth device and I retrieve a stream of bytes. I add the stream of bytes to the previously received bytes and store them in a byte array. But to visualize what is going on for the user I convert this to a string type and put this in a TextView in Android as a terminal view. So this means when there is a CR byte it means it has to show text starting the previous LF. For example (in this example I use a string and convert it to bytes to visually show it easier than a series of bytes):

byte[] text = "abcd\rghi\njklmnop\r\nqr\n\rHello world!\rByeee".getBytes();

Results in output:

ghid
   jklmnop
qr
Byeee world!

For this I have created the following algorhythm that works ...-ish:

public static byte[] handleCRLF(byte[] text, int lineBuffer) {
    // Make a byte array for the new text with the size of the line buffer
    byte[] newText = new byte[lineBuffer];

    int writingPointer = 0;
    int lfPointer = 0;

    // Loop through the contents of the text
    for (int i = 0; i < text.length; i++) {

        // Check if byte text[i] is equal to 13 which is CR
        if (text[i] == 13) {
            // Write a pointer of the text to the last LF position to start at a new line
            writingPointer = lfPointer;

        }
        // Check if byte text[i] is equal to 10 which is LF
        else if (text[i] == 10) {
            // Calculate the size of the new text when there is an LF
            int size = newText.length + lineBuffer;

            // Make a temporary byte array with the new size
            byte[] tmp = new byte[size];

            // Fill the temporary byte array with the new text
            for (int j = 0; j < newText.length; j++) {
                tmp[j] = newText[j];
            }

            // End the temporary byte array with an LF
            tmp[newText.length - 1] = 10;

            // Set the temporary byte array as the new Text contents
            newText = tmp;

            // Move the writing pointer forward
            writingPointer += lineBuffer;

            // Set the lf pointer based on the size minus the line buffer
            lfPointer = size - lineBuffer;

        }
        else {
            // Check if the writing pointer is not bigger, should not be the case but just in case
            if (writingPointer >= newText.length) continue;

            // Write text[i] on the position of the writing pointer
            newText[writingPointer] = text[i];

            // Increase the writing pointer to the next
            writingPointer++;
        }
    }

    // Replacing null with empty spaces
    for (int i = 0; i < newText.length; i++) {
        if (newText[i] != 0) continue;

        newText[i] = 32;
    }

    return newText;
}

This does work great in a way but it makes use of a so called "line buffer". So this means that every line is the size of a certain amount and thus results in a very big byte array with a lot of empty spaces...

Example of the text when replacing the empty space with * and a lineBuffer of 128:

ghid***************************************************************************************************************************
***jklmnop*********************************************************************************************************************
qr*****************************************************************************************************************************
Byeee world!********************************************************************************************************************

As you can see there are quite some * symbols...

My question is: is this a proper way of dealing with CR LF in a custom way? If so, how can I improve this in a way that there is no space being wasted? Currently I solved this in a cracky way by converting it to a string then read over every line and trim the end of the lines but this seems.. awkward.. and not efficient at all.

I have tried avoiding using the linebuffer and instead continue building it up but every time the result was wrong.

For my question I have searched quite a lot but couldn't find the answer, apologies if this is a duplicate question which has a proper solution. Couldn't find it sadly.

Thank you in advance!

"It also creates a new line": no it doesn't. Examine your assumptions, and your output, which shows no evidence of this claim. The string `ghid` does not appear in your source code: *ergo* either it wasn't produced or this isn't the real code. `\n\r` is not a valid line terminator. Apart from `ghid` the output is exactly as expected. CR returns the carriage to the left, and LF advances it one line. There is no problem here to solve. — user207421, Jul 06 '22 at 10:24
Try it in an online compiler before saying things. When you write any string, doesn't matter if `\n\r` is switched or not. Should not matter in case of receiving a byte stream. Besides that, the operating system of android translates a `\r` to `\r\n` automatically when converting to a string, doesn't fill the empty spaces which is required. Check this tutorial point: http://tpcg.io/_KWYJTX totally wrong output :). Again, check before you comment or downvote. @user207421 and ofcourse the string ghid doesn't appear in my source, that is the effect of `\r`. Check the link I provided — CSicking, Jul 06 '22 at 10:33
@user207421 I recommend for you to read the full theory behind the original workings of `CR` and `LF`. I linked it in my post. `CR` (13 or `\r`) stands for carriage return. Returns the carriage to the left. So this doesn't add any new line feed. Which means if you have `abcd\rghi` it should result in: `ghid` when compiled. Not on a new line. — CSicking, Jul 06 '22 at 11:29
*Returns the carriage to the left. So this doesn't add any new line feed.* Sounds logical and possibly correct historically. The only thing is, on old Macs the line separator was ... `'\r'` — g00se, Jul 06 '22 at 12:42
@g00se exactly, and that makes the proper `\r` and `\n` implementation more difficult to do it 'historically' proper. Now it feels like they work the same without any differences except for the name and byte when converted to a string. Sure there might be some differences depending on platforms / text readers. — CSicking, Jul 06 '22 at 12:54
@Clicking I already know the full theory behind the original workings of CR and LF, and that's how I know that LF CR is invalid. Teletypes were built to allow processing of LF while a CR was in progress, but not the other way around. I have seen this for myself in 1971. As to the CR causing `abcd` to be overwritten by `ghi`, that is exactly what is happening here. I don't know why you mention it. As I said, I don't see anything here that departs from the 'full theory behind the original workings of CR and LF'. — user207421, Jul 07 '22 at 00:30

score 1 · Accepted Answer · answered Jul 06 '22 at 11:35

This is a fun little challenge. It's pretty easy to improve the algorithm to avoid the useless spaces at the end of each line (see implementation below). For dynamically resizing the output buffer, there are a couple of options:

Continue to use byte[] and manually realloc+copy for each line.
Con: Inefficient (copying is O(n), and if you do this for each line the overall time complexity is O(n^2)).
Calculate the output buffer size using a separate pass over the input.
Con: Complicated, will probably result in code duplication.
Use byte[] and manually realloc+copy, but double the size each time for efficiency (see Dynamic array).
Con: A little tedious. (Though Arrays.copyOf() helps a lot.)
Use ArrayList<Byte> (Java's dynamic array implementation).
Con: High overhead due to Java's boxing (each byte must be wrapped in an object).
Use ByteArrayOutputStream, which resizes automatically.
Problem: Does not support seeking (which is required for handling \r).
Solution (implemented below): Subclass ByteArrayOutputStream to get access to the underlying buffer.
Use ByteBuffer, which supports absolute writes.
Con: Capacity is fixed, so you'd have to manually realloc.

Here is an implementation that uses a custom ByteArrayOutputStream.

import java.io.ByteArrayOutputStream;

class Main {
    public static void main(String[] args) {
        byte[] input = "abcd\rghi\njklmnop\r\nqr\n\rHello world!\rByeee".getBytes();
        byte[] output = processCRLF(input);
        System.out.write(output, 0, output.length);
    }

    public static byte[] processCRLF(byte[] input) {
        RandomAccessByteArrayOutputStream output = new RandomAccessByteArrayOutputStream(input.length);
        int pos = 0; // the offset in the output at which characters will be written (normally equal to output.size(), but will be less after '\r')
        int col = 0; // the position of the cursor within the current line (used to determine how far back to go on '\r', and how many spaces to insert on '\n')
        for (byte b : input) {
            if (b == '\r') {
                // go back to the start of the line (future characters will overwrite the line)
                pos -= col;
                col = 0;
            } else if (b == '\n') {
                // start a new line
                pos = output.size();
                output.putOrWrite(pos++, (byte) '\n');
                // if the cursor wasn't in column 0, insert spaces to stay in the same column
                for (int i = 0; i < col; i++) {
                    output.putOrWrite(pos++, (byte) ' ');
                }
            } else {
                // normal character
                output.putOrWrite(pos++, b);
                col++;
            }
        }
        return output.toByteArray();
    }
}

class RandomAccessByteArrayOutputStream extends ByteArrayOutputStream {
    public RandomAccessByteArrayOutputStream() {}

    public RandomAccessByteArrayOutputStream(int size) {
        super(size);
    }

    public void put(int index, byte b) {
        if (index < 0 || index >= size()) {
            throw new IndexOutOfBoundsException();
        }
        buf[index] = b;
    }

    public void putOrWrite(int index, byte b) {
        // like put(), but allows appending by setting 'index' to the current size
        if (index == size()) {
            write(b);
        } else {
            put(index, b);
        }
    }
}

Testing it now in a test environment and works like a charm. Thank you for the detailed explanation and work. Highly appreciated! Can see why you used `ByteArrayOutputStream` makes it an easier approach but with a problem that is not that difficult to solve. — CSicking, Jul 06 '22 at 11:43

How to handle CR and LF in the proper way in Java (android)

1 Answers1