- Hello from Rust example
- Naming conventions
- Representing plain enums
- Dealing with structures
- Complex enums?
DISCLAIMER: I am not a professional C/C++ developer, so it means:
- I will describe some things that may look very obvious.
- The outcome probably will not be a 100% idiomatic C code.
- If you know how some things can be done better, please let me know by writing a comment.
First let’s make a minimal C program, that calls Rust.
1 2 3
Add this to
[lib] name = "whatlang" crate-type = ["staticlib", "cdylib"]
It tells cargo that we want to compile a static library and get
src/lib.rs we implement a small function that prints a message to stdout:
1 2 3 4
extern let me extract some quotes from:
FFI with Haskell and Rust article:
The #[no_mangle] tells the Rust compiler not to do anything weird with the symbols of this function when compiled because we need to be able to call it from other languages.
This is needed if you plan on doing any FFI. Not doing so means you won’t be able to reference it in other languages
extern means this is externally available outside our library and tells the compiler to follow the C calling convention when compiling
Let’s compile our lib:
cargo build --release
nm tool we can check that
libwhatlang.so really contains
nm -D ./target/release/libwhatlang.so | grep hello 0000000000003190 T print_hello_from_rust
Then we need
src/whatlang.h header file with a function declaration:
And finally a C program itself (we put it into
1 2 3 4 5
gcc -o ./examples/hello ./examples/hello.c -Isrc -L. -l:target/release/libwhatlang.so
examples/hello binary, which we can run:
./examples/hello Hello from Rust
During the development process we’ll likely need to recompile and run the program frequently.
To automate this let’s create a
Makefile with few commands:
1 2 3 4 5 6 7 8 9 10 11
Now, we can run
make run to recompile
hello.c and run
In the rest for the article I’ll go through common problems, design decisions and pitfalls I faced.
Since C does not have namespaces (some people may disagree) I had to stick to some rules in order to avoid name collision with other libraries and confusion:
- Every function, type or constant starts with
- If a function is associated with a particular format then its name has format
Similar logic rules apply to everything else. It may seem too verbose, but as I see it’s a pretty common approach for many C libraries.
1 2 3 4 5 6 7
Lang represents 83 different languages, which can be encoded with 1 byte.
That was my initial assumption an it seemed to be correct, until later I uncovered some bugs.
I decided to convert Lang enum into
u8 with std::mem::transmute function,
in order to figure out how it’s encoded:
and got the following error:
= note: source type: whatlang::Lang (32 bits) = note: target type: u8 (8 bits)
Wow! So, actually the enum takes 4 bytes, instead of 1.
u32 makes things work as expected:
lang_int = 14
Now it make sense, because English is on 15th position in the Lang declaration (remember, counting starts with 0).
So in C such enum can be mapped to
uint32_t type from
To define all the language I ended up with such list of constants:
1 2 3 4 5 6
It’s quite verbose, so one would rather use scripting a language to generate such boilerplate code.
UPDATE: later I figured out, that actually without
#[repr(C)] Rust optimizes memory and
uses 1 byte for
Lang enum. So
uint32_t can be replaced with
uint8_t. It should work
as far as number of enum variants does not exceed 256.
Returning a structure from a function
From C side it’s relative simple: just pass a pointer to a string, like it’s done here with argument
On Rust side you’ll need to convert a pointer into
String so you can manipulate the data as a string.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
In the example above the function accepts a raw pointer
*const c_char (it can be also
*mut c_char if you need to mutate data).
Then we transform it into
CStr calling unsafe method
Finally we’re calling
CStr::to_str(&self) function, which converts C string into
This operation may fail, if the C string does not contain a valid UTF-8 sequence.
Lang provides some methods that return static strings, like
eng_name() to get language name in English.
My first thought was “I just can return a raw pointer to the string”, so the initial solution was like:
C function declaration:
1 2 3 4
But there is problem. C expects strings to be terminated with
\0 character, while Rust actually
organizes static strings in a different way.
When I expected the output to be simple
Russian, the output was the entire massive of static data:
So, I’ve decided that I actually need to copy string from Rust static memory and ensure that
\0 is added.
So I came up with the following function:
Now user needs to pass a pointer to a buffer, where result must be written.
1 2 3 4 5 6 7 8 9 10
First, we convert
CString. Then we use
libc::strcpy from libc
crate to copy the string.
NOTE: it’s responsibility of a caller to ensure, that the buffer size is big enough (at least 30 bytes).
I have the following flat Rust structure
1 2 3 4 5 6
Script are plain enums), and it easily maps to
1 2 3 4 5
It could be slightly more complex with nested structures, but the idea stays the same.
I guess it can be done at least in few different approaches. The way I do it: a function receives a pointer to a preallocated memory for a structure as one of the arguments.
You’ve already seen
whatlang_detect function above:
info is pointer, where result must be written in case of success (
0 is returned).
Another way to do this is to return a pointer directly:
In this case Rust function must return boxed structure:
1 2 3 4 5
NOTE: In this approach the memory for the structure is allocated by Rust, but it’s responsibility of C program to free it.
There is also some thing, that I am actually not aware how do properly: how to represent complex Rust enum in C?
Therefore I also don’t know how gracefully represent
Maybe, it’s not actually necessary. For know as a workaround my function
0 in case of
1 in case of
None. It writes
a result to preallocated memory by a given pointer
But I would appreciate if you share some other insights about this.
There are also some things, that are not covered in this article like tuples and arrays. But you may get some ideas from this article.
It was shown how to create C bindings for a Rust library. It may not be something, that you would do often, but having such option is always nice. It means also, that Rust libraries may be ported to plenty other languages that has FFI support, and this sounds really cool!
Thanks for reading. Below you’ll find some useful links that helped me during this investigation.
People on Reddit gave me a very good constructive feedback. Some things I did wrong here and I highly recommend you to read this comment in addition.
- The Rust Book, Foreign Function Interface - section in the first edition for The Rust book, about how to do FFI.
- The Nomicon book - entire book dedicated to unsafe programming in Rust.
- LibC - a crate, that allows to call C function from Rust. You’ll find here C type definitions, constants and standard functions.
- Rust FFI: Sending strings to the outside world - this article explains how to expose Rust strings for NodeJS.
- FFI with Haskell and Rust - yet another blog article about FFI
- Using unsafe tricks to examine Rust data structure layout
- Complex types with Rust’s FFI
- netinfo-ffi - was my initial source of inspiration, where I could find some examples.
- https://github.com/greyblake/whatlang-ffi - whatlang C bindings described in this post