How to split a Vec by a sequence of chars?

Question

I want to extract the payload of a HTTP request as a Vec<u8>. In the request, the payload is separated from the rest by the sequence \r\n\r\n, that's why I want to split my Vec at this position, and take the second element.

My current solution is to use the following function I wrote.

fn find_payload_index(buffer: &Vec<u8>) -> usize {
    for (pos, e) in buffer.iter().enumerate() {
        if pos < 3 {
            continue
        }
        if buffer[pos - 3] == 13 && buffer[pos - 2] == 10 && buffer[pos - 1] == 13 && buffer[pos] == 10 {
            return pos + 1;
        }
    }
    0
}

13 is the ASCII value of \r and 10 the value of \n. I then split by the returned index. While this solution is technically working, it feels very unclean, and I was wondering how to do this in a more elegant way.

score 3 · Accepted Answer · answered Nov 15 '20 at 20:37

First of:

A function should almost never have a &Vec<_> parameter.

See Why is it discouraged to accept a reference to a String (&String), Vec (&Vec), or Box (&Box) as a function argument?.
Don't use the magic values 10 and 13, Rust supports byte literals: b'\r' and b'\n'.

As for your question: I believe you can make it a bit simpler using windows and matches! with a byte string literal pattern:

fn find_payload_index(buffer: &[u8]) -> Option<usize> {
    buffer
        .windows(4)
        .enumerate()
        .find(|(_, w)| matches!(*w, b"\r\n\r\n"))
        .map(|(i, _)| i)
}

Permalink to the playground with test cases.

Thanks a million, @mcarton, that is exactly what I was looking for! Also, thank you for the general Rust advice :) — Robinbux, Nov 15 '20 at 20:44

score 2 · Answer 2 · answered Nov 15 '20 at 22:35

Note that slice has a starts_with method which will more easily do what you want:

fn find_payload_index(buffer: &[u8]) -> usize {
    for i in 0..buffer.len() {
        if buffer[i..].starts_with(b"\r\n\r\n") {
            return i
        }
    }
    panic!("malformed buffer without the sequence")
}

I see no reason to use enumerate if the actual element itself never be used, simply looping over 0..buffer.len() seems the easiest solution to me.

I have also elected to make the function panic, rather than return 0, when the sequence be malformed, which I believe is more proper, though you should probably in the end return some kind of Result value, and handle the error case cleanly, if the input be malformed, but you should never return 0 in this case.

score 0 · Answer 3 · answered May 31 '21 at 13:32

0

A shorter alternative for @mccarton answer would be to use position:

fn find_payload_index(buffer: &[u8]) -> Option<usize> {
    buffer
        .windows(4)
        .position(|arr| arr == b"\r\n\r\n")
}

answered May 31 '21 at 13:32

luizc

51
1
4

How to split a Vec by a sequence of chars?

3 Answers3