0

I work on a Rust library used, through C headers, in a Swift UI.

I can read from Swift in Rust, but I can't write right away to Swift (so from Rust) what I've just read.

--

Basically, I get to convert successfully in String an *const i8 saying hello world.

But the same String fails to be handled with consistency by as_ptr() (and so being parsed as UTF-8 in Swift) =>

  1. Swift send hello world as *const i8
  2. Rust handle it through let input: &str successfully (#1 print in get_message()) => rightly prints hello world
  3. Now I can't convert this input &strto a pointer again:
  • the pointer can't be decoded by Swift
  • the "pointer encoding" changes at every call of the function (should be always the same output, as for "hello world".as_ptr())

Basically, why

  • "hello world".as_ptr() always have the same output and can be decoded by Swift
  • when input.as_ptr() has a different output every time called and can't never be decoded by Swift (where printing input rightly returns hello world)?

Do you guys have ideas?

#[derive(Debug)]
#[repr(C)]
pub struct MessageC {
    pub message_bytes: *const u8,
    pub message_len: libc::size_t,
}

/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
unsafe fn c_string_safe(cstring: *const i8) -> String {
    CStr::from_ptr(cstring).to_string_lossy().into_owned()
}

/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
/// on `async extern "C"` => <https://stackoverflow.com/a/52521592/7281870>
#[no_mangle]
#[tokio::main] // allow async function, needed to call here other async functions (not this example but needed)
pub async unsafe extern "C" fn get_message(
    user_input: *const i8,
) -> MessageC {
    let input: &str = &c_string_safe(user_input);
    println!("from Swift: {}", input); // [consistent] from Swift: hello world
    println!("converted to ptr: {:?}", input.as_ptr()); // [inconsistent] converted to ptr: 0x60000079d770 / converted to ptr: 0x6000007b40b0
    println!("directly to ptr: {:?}", "hello world".as_ptr()); // [consistent] directly to ptr: 0x1028aaf6f
    MessageC {
        message_bytes: input.as_ptr(),
        message_len: input.len() as libc::size_t,
    }
}

user4815162342
  • 141,790
  • 18
  • 296
  • 355
Ontokrat
  • 189
  • 1
  • 14
  • 1
    Can you elaborate what problem are you encountering exactly? The only description you give is "failed to be handled _with consistency_", and it's far from obvious what that means. If possible, state it in the form of "I expected xyz to happen, but instead I observed qwx". – user4815162342 Jan 10 '22 at 11:15
  • 1
    Also, the way you construct `MessageC` looks unsound. `input.as_ptr()` converts `input`, a string slice, to a pointer. *However*, the string slice borrows from the data owned by the `String` returned by `c_string_safe(user_input)`. Since this (unnamed) `String` lasts until the end of the function, you are basically returning a dangling pointer. – user4815162342 Jan 10 '22 at 11:17
  • @user4815162342 Description updated. Do you mean `input` data is deleted from memory before being converted to a pointer? – Ontokrat Jan 10 '22 at 11:40
  • 2
    `input` data is _deallocated_ as soon as the function returns, because it's owned by a variable local to the function. Deallocated is not the same as deleted - deallocated memory might appear untouched, but you're not allowed to access it, and it can be reused for other allocations (possibly from other threads). Even if it appears to work, that is only by accident because deallocation doesn't clear the memory, it just marks the region as available. – user4815162342 Jan 10 '22 at 12:15
  • @user4815162342 I added a lot more details, let me know if it's still unclear. Thanks for your time. – Ontokrat Jan 10 '22 at 12:36
  • The problem is almost certainly due to the string being deallocated when the function returns. Do you understand that the `MessageC` value returned by `get_message()` contains a dangling pointer? – user4815162342 Jan 10 '22 at 12:39
  • @user4815162342 Well, I don't understand it a lot to be honest. I know that's the problem, but I can't figure out how to make the value of `input` "last long enough" to be converted as a pointer. Should I copy/clone the value of `input` in an other var? – Ontokrat Jan 10 '22 at 12:45
  • 1
    The easiest way to avoid the data getting deallocated is to `forget()` the string that owns it, [like this](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5df9a9749b52b0024f0ddd16bed4f48a). In that case you a separate function to eventually deallocate the data by re-creating the owned string, and immediately dropping it. The example includes it as `free_message_c()`. Hope this helps. – user4815162342 Jan 10 '22 at 12:48
  • 1
    Also, `"hello world".as_ptr()` works because `"hello world" is a static `&str` which is baked into the executable and never gets deallocated. (There's no reason to, since it's just a constant inserted into the text segment of the executable by the compiler.) – user4815162342 Jan 10 '22 at 12:53
  • Also, for the Rust->Swift direction, keep in mind the bytes will not be null-terminated unless you use `CStr`/`CString` (as you do for the Swift->Rust direction); however, there are initializers like `String.init(bytes:encoding:)` where this shouldn't be an issue. – Coder-256 Jan 10 '22 at 12:58
  • @user4815162342 Thanks for the detailed answer: it totally solves the issue. What about writing an answer (there so few about Rust & FFI - and you gave me some good teaching in your comments)? Just to understand, you also mean that `free_message_c()` should be a second independent call just after `get_message()`? Last, When you say "Also, "hello world".as_ptr() works because "hello world" is a static &str` which is baked into the executable and never gets deallocated.", do you imply here too we should avoid the data getting deallocated? – Ontokrat Jan 10 '22 at 19:24
  • @Coder-256 Thanks. Actually, `MessageC` allows me on `Swift` side to recompose with `let buffer = UnsafeBufferPointer(start: messageBytes, count: messageLen) let string = String(bytes: buffer, encoding: String.Encoding.utf8)`. It's not a good practice? – Ontokrat Jan 10 '22 at 19:29
  • @Ontokrat Done. Hopefully the answer also clears up the question where to call `free_message_c()`, and the thing with `"hello world"`. If not, please post a comment under the answer. – user4815162342 Jan 10 '22 at 19:34

1 Answers1

2

The way you construct MessageC is unsound and returns a dangling pointer. The code in get_message() is equivalent to this:

pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
    let _invisible = c_string_safe(user_input);
    let input: &str = &_invisible;
    // let's skip the prints
    let msg = MessageC {
        message_bytes: input.as_ptr(),
        message_len: input.len() as libc::size_t,
    };
    drop(_invisible);
    return msg;
}

Hopefully this formulation highlights the issue: c_string_safe() returns an owned heap-allocated String which gets dropped (and its data deallocated) by the end of the function. input is a slice that refers to the data allocated by that String. In safe Rust you wouldn't be allowed to return a slice referring to a local variable such as input - you'd have to either return the String itself or limit yourself to passing the slice downwards to functions.

However, you're not using safe Rust and you're creating a pointer to the heap-allocated data. Now you have a problem because as soon as get_message() returns, the _invisible String gets deallocated, and the pointer you're returning is dangling. The dangling pointer may even appear to work because deallocation is not obligated to clear the data from memory, it just marks it as available for future allocations. But those future allocations can and will happen, perhaps from a different thread. Thus a program that references freed memory is bound to misbehave, often in an unpredictable fashion - which is precisely what you have observed.

In all-Rust code you'd resolve the issue by safely returning String instead. But you're doing FFI, so you must reduce the string to a pointer/length pair. Rust allows you to do just that, the easiest way being to just call std::mem::forget() to prevent the string from getting deallocated:

pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
    let mut input = c_string_safe(user_input);
    input.shrink_to_fit(); // ensure string capacity == len
    let msg = MessageC {
        message_bytes: input.as_ptr(),
        message_len: input.len() as libc::size_t,
    };
    std::mem::forget(input); // prevent input's data from being deallocated on return
    msg
}

But now you have a different problem: get_message() allocates a string, but how do you deallocate it? Just dropping MessageC won't do it because it just contains pointers. (And doing so by implementing Drop would probably be unwise because you're sending it to Swift or whatever.) The solution is to provide a separate function that re-creates the String from the MessageC and drops it immediately:

pub unsafe fn free_message_c(m: MessageC) {
    // The call to `shrink_to_fit()` above makes it sound to re-assemble
    // the string using a capacity equal to its length
    drop(String::from_raw_parts(
        m.message_bytes as *mut _,
        m.message_len,
        m.message_len,
    ));
}

You should call this function once you're done with MessageC, i.e. when the Swift code has done its job. (You could even make it extern "C" and call it from Swift.)

Finally, using "hello world".as_ptr() directly works because "hello world" is a static &str which is baked into the executable and never gets deallocated. In other words, it doesn't point to a String, it points to some static data that comes with the program.

user4815162342
  • 141,790
  • 18
  • 296
  • 355