0

I'd like to achieve something similar to the code below, but without incurring the costs of String (UTF-8 validation mainly), trying to use &[u8] instead:

fn main() {
    let mut outer: Vec<String> = Vec::new();
    {
        for _ in 0..1 {
            let mut inner: Vec<&str> = Vec::new();
            inner.push("hello");
            inner.push("world");
            outer.push(inner.join(",").to_string());
        }
    }
}

But my &[u8]-ified attempt below does not compile (https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=20130b6377241f1566a724440522443e)

fn main() {
    let mut outer: Vec<&[u8]> = Vec::new();
    {
        for _ in 0..1 {
            let mut inner: Vec<&[u8]> = Vec::new();
            inner.push(b"hello");
            inner.push(b"world");
            outer.push(&inner.join(&b','));
        }
    }
}

Rustc complains that:

error[E0716]: temporary value dropped while borrowed
 --> src/lib.rs:8:25
  |
8 |             outer.push(&inner.join(&b'!'));
  |             -----       ^^^^^^^^^^^^^^^^^ - temporary value is freed at the end of this statement
  |             |           |
  |             |           creates a temporary which is freed while still in use
  |             borrow later used here
  |
  = note: consider using a `let` binding to create a longer lived value

While I do understand why the Rust compiler rejects this code, I do not know how to actually fix this, while keeping the concept intact. I basically want to transfer ownership of the result of inner.join(&b',') into the outer Vec, More generally, creating temporary [u8] for text in a loop using join(), that are then to be pushed into a outer Vec, without running into ownership issues.

Background: parsing an enormous quantity (as in hundreds of terabytes) of csv files using the csv crate, which has variants working with ByteBuffers and with Strings, the larger having a measurable negative performance impact. In the full code, I loop though some fields of each csv record, sometimes modify the value. The selected fields are added to the "inner" Vec and then want to create a csv record again with join() to add Vec of selected records. To avoid the noise of the full code, including CSV parsing etc, I created this MRP above to depict the kind of problem I am facing.

Disclaimer: Still learning Rust, so your understanding and some didactic feedback would be appreciated.

Mark vL
  • 21
  • 3
  • 1
    What's the purpose of the for-loop in your examples? Also, what particular costs of String are you referring to? Do you mean the heap allocation cost, the UTF8 validation cost, or both? – pretzelhammer May 22 '20 at 15:06
  • 1
    The final compiler note says _"consider using a `let` binding to create a longer lived value"_. Didn't that solve the problem? – E_net4 May 22 '20 at 15:11
  • `inner.join(&b',')` would be a `Vec<_>` here. Who do you imagine will own that if you only store a reference to it? – mcarton May 22 '20 at 15:12
  • [Here's the answer to the other question applied here](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e6269fcd2b07a3b1382c143afff21983). Like pretzelhammer, I'm not sure what "costs" you are hoping to avoid -- `[u8]::join` returns a `Vec` already, so you are not incurring any additional allocations by moving it -- so if this is not the answer you are looking for, please [edit] your question to contain more detail. – trent May 22 '20 at 15:27
  • Just edited my post to add some background + clarify the overhead (= UTF8 validation mainly) – Mark vL May 22 '20 at 15:29

0 Answers0