9

I'm trying to build Octave functions in Rust. Octave's API is in C++, so I've generated bindings using rust-bindgen. I'm currently working through the problems that occur when trying to generate bindings that include std::string. It would be nice if I could leave it opaque and valid pointer to a C++ std::string. Would it be possible to build a utility function on the C++ side any time I needed to pass in a C++ std::string?

I was naive when I first attempted this. It is clearly wrong. A Rust std::ffi:CString is for C strings, not C++ strings. I found this recent blog helpful when comparing the two. My first attempt looks like this:

#![allow(non_snake_case)]
#![allow(unused_variables)]

extern crate octh;

// https://thefullsnack.com/en/string-ffi-rust.html
use std::ffi::CString;

#[no_mangle]
pub unsafe extern "C"  fn Ghelloworld (shl: *const octh::root::octave::dynamic_library, relative: bool) -> *mut octh::root::octave_dld_function {
    let name = CString::new("helloworld").unwrap();
    let pname = name.as_ptr() as *const octh::root::std::string;
    std::mem::forget(pname);

    let doc = CString::new("Hello World Help String").unwrap();
    let pdoc = doc.as_ptr() as *const octh::root::std::string;
    std::mem::forget(pdoc);

    octh::root::octave_dld_function_create(Some(Fhelloworld), shl, pname, pdoc)
}    

pub unsafe extern "C" fn Fhelloworld (args: *const octh::root::octave_value_list, nargout: ::std::os::raw::c_int) -> octh::root::octave_value_list {
    let list_ptr = ::std::ptr::null_mut();
    octh::root::octave_value_list_new(list_ptr);
    ::std::ptr::read(list_ptr)
}

I need to pass in the function name and documentation as strings to octave_dld_function_create. I wish there was a CppString that I could use instead. Any suggestions on how to proceed?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Cameron Taggart
  • 5,771
  • 4
  • 45
  • 70
  • C++ compiler/stdlib vendors don't have this figured out; I wouldn't expect Rust to. ;-] (To be clear, `std::string` is a mandated interface, not a mandated implementation, and if you want to pass _anything_ by value you at least need to know its size/layout.) – ildjarn Oct 02 '17 at 11:47
  • I'm trying to interop with GNU Octave on Ubuntu Linux to begin with. The compiler is gcc 6.3.0 from `gcc -dumpversion` and the stdlib is libstdc++.so.6 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 from `ldconfig -p | grep stdc++`. https://stackoverflow.com/a/10355215/23059 – Cameron Taggart Oct 02 '17 at 12:23
  • 1
    And is that with or without `_GLIBCXX_USE_CXX11_ABI ` defined? ;-] Point being, if it hasn't been abstracted properly in C++-building tools, doing so elsewhere has a very slim chance. E.g. my system has libc++, libstdc++, and Dinkumware stdlibs available. – ildjarn Oct 02 '17 at 12:27
  • I don't know. I see `-lstdc++` referenced in some of the Octave build files. I'm not sure I need something abstract. I can add a function to the Octave libraries when I build it. – Cameron Taggart Oct 02 '17 at 13:06
  • `-lstdc++` is a linker command, which only implies some (shared) object files to link to; because even the C++ compiler doesn't know the answer to the "*what is a `std::string`, anyway?*" question other than the source code you feed to it. – ildjarn Oct 02 '17 at 13:16
  • @ildjarn: Actually, the fact that `std::string` is defined in a library is purely coincidental; it is perfectly fine as far as the Standard is concerned to special-case a standard library type in the compiler. – Matthieu M. Oct 02 '17 at 18:04

2 Answers2

9

This is a classic FFI issue and the solution is to use a "hour-glass" design: Language A <=> Common ABI <=> Language B.

It could be possible, of course, to evolve bindgen so that it can faithfully reproduce a C++ ABI, but in practice it would require a full C++ compiler which is probably too much effort.

Using a "hour-glass" design, each of the languages with a difficult ABI use their own specialized toolchain to convert to a specific well-known ABI. In this case, it would be C++ <=> C <=> Rust.

A possible solution is to create a C wrapper library around the C++ API, and then use bindgen on that. This is what the LLVM and Clang projects do.

It is the simplest solution, and the Octavo project may very well be willing to integrate such an octavo-c facade in-tree (which is always best to guarantee it's up-to-date).


Another solution would be to create a C++ companion library for bindgen, which takes care of providing a C-ABI for common C++ types (such as std::string). This would be a more difficult endeavor, especially since:

  • C has no generics, thus C++ templates would have to be either out-of-scope, or pre-instantiated templates would have to be wrapped one at a time,
  • C does not know how to invoke move or copy constructors, so unless the C++ types are already PODs, they have to be manipulated through opaque pointers,
  • C does not know ...
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
0

I came across this same issue when trying to call some C++ functions with std::string in their parameters (as both input and output values) from Rust code.

The thing that helped me the most was finding this SO post that says:

C++ strings have a constructor that lets you construct a std::string directly from a C-style string:

const char* myStr = "This is a C string!";
std::string myCppString = myStr;

Or, alternatively:

std::string myCppString = "This is a C string!";

I tested this in a C++ program to make sure that it worked through a function call:

#include <stdlib.h>
void print_str(std::string myStr){
    std::cout << myStr << std::endl;
}

int main(void)
{
    const char* myStr = "This is a C string!";
    // std::string myCppString = myStr;
    print_str(myStr);
}

I also found this blog post incredibly helpful for helping explain enough about what code I needed to write on the rust side of things for me to be able to piece it together. The code I ended up using was very similar to the example from the blog post at the end of the section titles "1.3 std::mem::forget it to keep it":

pub extern fn string_from_rust() -> *const c_char {
   let s = CString::new("Hello World").unwrap();
   let p = s.as_ptr();
   std::mem::forget(s);
   s
}

Knowing this, I could now use CString to go from rust string types to C-style strings, and then pass those as parameters to an extern "C"-marked rust block (mostly-automatically) generated by bindgen. C++ would then see that I was giving it c-style strings in place of std::string and use its built-in constructor to convert it. No writing additional C/C++ wrappings required!

About that "almost-automatic" bit: Maybe i'm just inexperienced with bindgen, but to get this to work, I ended up modifying the generated binding to change the types that bindgen generated for the c++ std::string parameters to const* c_char (i.e. c strings). Also, since I was working with function parameters instead of return types like the blog post was, the last line of my code had an unsafe block containing a call to the function that looked more like some of the examples from the CString documentation.

MoralCode
  • 1,954
  • 1
  • 20
  • 42