3

So let's say I have a String, "Foo Bar" and I want to create a substring of "Bar" without allocating new memory.

So I moved the raw pointer of the original string to the start of the substring (in this case offsetting it by 4) and use the String::from_raw_parts() function to create the String.

So far I have the following code, which as far as I understand should do this just fine. I just don't understand why this does not work.

use std::mem;

fn main() {
    let s = String::from("Foo Bar");

    let ptr = s.as_ptr();

    mem::forget(s);

    unsafe {
        // no error when using ptr.add(0)
        let txt = String::from_raw_parts(ptr.add(4) as *mut _, 3, 3);

        println!("{:?}", txt); // This even prints "Bar" but crashes afterwards

        println!("prints because 'txt' is still in scope");
    }

    println!("won't print because 'txt' was dropped",)
}

I get the following error on Windows:

error: process didn't exit successfully: `target\debug\main.exe` (exit code: 0xc0000374, STATUS_HEAP_CORRUPTION)

And these on Linux (cargo run; cargo run --release):

munmap_chunk(): invalid pointer

free(): invalid pointer

I think it has something to do with the destructor of String, because as long as txt is in scope the program runs just fine.

Another thing to notice is that when I use ptr.add(0) instead of ptr.add(4) it runs without an error.

Creating a slice didn't give me any problems on the other Hand. Dropping that worked just fine.

let t = slice::from_raw_parts(ptr.add(4), 3);

In the end I want to split an owned String in place into multiple owned Strings without allocating new memory.

Any help is appreciated.

Peter Hall
  • 53,120
  • 14
  • 139
  • 204
Dan
  • 83
  • 9
  • Why do you need the segments to be owned? Is it for mutability or just to avoid the lifetimes being bound to the scope of the original `String`? – Peter Hall Jun 23 '19 at 13:06
  • Mostly because of lifetimes because the original String should be dropped after splitting – Dan Jun 23 '19 at 13:25

1 Answers1

4

The reason for the errors is the way that the allocator works. It is Undefined Behaviour to ask the allocator to free a pointer that it didn't give you in the first place. In this case, the allocator allocated 7 bytes for s and returned a pointer to the first one. However, when txt is dropped, it tells the allocator to deallocate a pointer to byte 4, which it has never seen before. This is why there is no issue when you add(0) instead of add(4).

Using unsafe correctly is hard, and you should avoid it where possible.


Part of the purpose of the &str type is to allow portions of an owned string to be shared, so I would strongly encourage you to use those if you can.

If the reason you can't just use &str on its own is because you aren't able to track the lifetimes back to the original String, then there are still some solutions, with different trade-offs:

  1. Leak the memory, so it's effectively static:

    let mut s = String::from("Foo Bar");
    let s = Box::leak(s.into_boxed_str());
    
    let txt: &'static str = &s[4..];
    let s: &'static str = &s[..4];
    

    Obviously, you can only do this a few times in your application, or else you are going to use up too much memory that you can't get back.

  2. Use reference-counting to make sure that the original String stays around long enough for all of the slices to remain valid. Here is a sketch solution:

    use std::{fmt, ops::Deref, rc::Rc};
    
    struct RcStr {
        rc: Rc<String>,
        start: usize,
        len: usize,
    }
    
    impl RcStr {
        fn from_rc_string(rc: Rc<String>, start: usize, len: usize) -> Self {
            RcStr { rc, start, len }
        }
    
        fn as_str(&self) -> &str {
            &self.rc[self.start..self.start + self.len]
        }
    }
    
    impl Deref for RcStr {
        type Target = str;
        fn deref(&self) -> &str {
            self.as_str()
        }
    }
    
    impl fmt::Display for RcStr {
        fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
            fmt::Display::fmt(self.as_str(), f)
        }
    }
    
    impl fmt::Debug for RcStr {
        fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
            fmt::Debug::fmt(self.as_str(), f)
        }
    }
    
    fn main() {
        let s = Rc::new(String::from("Foo Bar"));
    
        let txt = RcStr::from_rc_string(Rc::clone(&s), 4, 3);
        let s = RcStr::from_rc_string(Rc::clone(&s), 0, 4);
    
        println!("{:?}", txt); // "Bar"
        println!("{:?}", s);  // "Foo "
    }
    
Peter Hall
  • 53,120
  • 14
  • 139
  • 204
  • [Here's the source for the curious](https://doc.rust-lang.org/src/alloc/vec.rs.html#1233-1249) (`String::split_off` calls `Vec::split_off`). – trent Jun 23 '19 at 12:11
  • Yeah just looked at the source as well. Is there any way to tell Rust that the offset pointer is a new pointer it can free? Also std::ptr::copy sounds promising, but doesn't work either – Dan Jun 23 '19 at 12:14