Rust Knowledge Refinement
Serhii Potapov February 07, 2021 #rustRecently I reread The Rust Book. The last time I read the book (more precisely the first edition of it) was in 2016. Since that some things got forgotten, others did not exist yet. So to address my knowledge gaps I decided to document some discoveries. This blog post is mostly for me myself, so I can come back here and quickly refresh my memory. But you may discover something new and interesting as well.
Copy and Drop relation
Types can exclusively implement either Copy
or Drop
traits. But not both.
If I think about it for a while it does make sense: if a type requires a special Drop
trait to correctly cleanup resources,
a simple memory copy(Copy
trait) would lead to a state, where a resource may have multiple owners.
(but we know that the ownership rules allow only one owner).
Slices
Type definitions:
&str
&[i32]
,&[f64]
, etc.
A slice occupies memory on the stack that equals double usize
(16 bytes on x86-64). One usize
is to store a memory pointer, another one is for length.
&str slice boundaries
String slice range indices must occur at valid UTF-8 character boundaries. If you attempt to create a string slice in the middle of a multibyte character, your program will exit with an error.
For example, this code will execute and print russian "П" letter:
let hi = "Привет Мир";
let slice = &hi; // first 2 bytes
println!;
But if we replace the second line with let slice = &hi[0..3];
it panics:
thread 'main' panicked at 'byte index 3 is not a char boundary;
it is inside 'р' (bytes 2..4) of `Привет Мир`', src/main.rs:3:1
Implicit &String -> &str conversion
Rust is typically very strict about types, however, this code is valid:
let hi: String = String from;
let slice: &str = &hi;
&hi
with type &String
gets assigned to a variable with type &str
and that compiles.
This is due to an exception called Deref coercion.
Struct update syntax
Rust supports the struct update syntax, similar to the one that JavaScript has. I don't happen to use it often, but it could be very helpful in tests.
let espresso = Product ;
let double_espresso = Product ;
Iter::collect()
I always used Iter::collect()
method to build a vector, but actually collect()
can be used to build many other collections, that
implement FromIterator
trait.
For example that following code produces a hash map of intergers from 1 to 5 and their squares:
let squares: = .map.collect;
println!; // {3: 9, 4: 16, 1: 1, 5: 25, 2: 4}
impl Trait syntax
impl Trait as a function argument
I would typically define a generic function like this:
But Rust also has impl Trait
syntax:
However, I would still prefer the first option, because it better communicates visually the fact that the function is generic.
impl Trait as a return value
impl Trait
can also be used to return values when a returned type is too long to write it manually.
It's typically used to return futures or iterators.
In the following example function filter_div3()
takes an iterator that produces i32
applies
an extra filter on it and returns back a new iterator.
This is handy but comes also with restrictions. We're not allowed to use the returned value as anything else, but an iterator.
E.g. this code does not compile:
println!;
because the compiler has no guarantees, that value returned by filter_div3()
implements Debug
trait.
UPDATE (based on the reddit comment):
It's is possible use Debug
if you change the return type from impl Iterator<Item=i32>
to impl Iterator<Item=i32> + Debug
.
Blanket implementations
The concept was known to me, but I was not aware of the term itself.
From The Rust Book:
Implementations of a trait on any type that satisfies the trait bounds are called blanket implementations.
Examples:
- Every type that implements
Display
, gets implementation ofToString
automatically - If type
A
implementsFrom<B>
, then typeB
automatically gets implementation ofInto<A>
.
Lifetime elision rules
The first rule is:
Each parameter that is a reference gets its lifetime parameter.
// E.g. the following function signature
The second rule is:
If there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters.
The third rule is:
If there are multiple input lifetime parameters, but one of them is
&self
or&mut self
because this is a method, the lifetime ofself
is assigned to all output lifetime parameters.
Tests
Passing options to cargo test
We can pass options directly to cargo test
, for example:
cargo test --help
Or we can pass options to a binary that cargo test
runs:
cargo test -- --help
I often use
cargo test -- --nocapture
to see an output printed in my tests.
Using Result<T, E> in tests
Test functions can be defined to return Result<T, E>
type:
However, I find it too cumbersome.
Running tests filtered by name
E.g. run all tests that contains "pattern" in their name:
cargo test pattern
Ignoring specific tests
We can mark tests with #[ignore]
to ignore them.
To run exclusively tests that are maked as ignored:
cargo test -- --ignored
Shared behavior for integration tests
There is a special module common
(tests/common/mod.rs
) where integration should
keep their shared behavior.
Read more in the Rust book: submodules in integration tests.
Closures
Closures can capture values from their environment in three ways, which directly map to the three ways a function can take a parameter: taking ownership, borrowing mutably, and borrowing immutably. These are encoded in the three Fn traits as follows:
FnOnce trait
FnOnce
consumes the variables it captures from its enclosing scope, known as the closure’s environment. To consume the captured variables, the closure must take ownership of these variables and move them into the closure when it is defined.
The Once part of the name represents the fact that the closure can’t take ownership of the same variables more than once, so it can be called only once.
FnMut trait
FnMut
can change the environment because it mutably borrows values.
Fn trait
Fn
borrows values from the environment immutably.
Smart pointers
References VS Pointers
From The Rust Book, chapter 15:
In Rust, which uses the concept of ownership and borrowing, an additional difference between references and smart pointers is that references are pointers that only borrow data; in contrast, in many cases, smart pointers own the data they point to.
Deref coercion
Traits Deref and DerefMut are responsible for dereferencing pointers.
As it was mentioned earlier, deref coercion is implicit and happens on function invocations or variable assignments. What I find interesting, is that Rust may perform multiple deref coercions to get a necessary type.
Consider the following code:
use Deref;
;
;
;
i32
is wrapped by type A
, which is wrapped by B
, which is wrapped by C
. All the 3 wrapper types implement Deref
.
When we call print_number(&i32)
passing &C
as an argument rust compiler implicitly calls c.deref().deref().deref()
,
performing this chain of conversion:
&C -> &B -> &A -> &i32
Eventually, the output of that little program above is:
deref C to B
deref B to A
deref A to i32
number = 13
Read more about implicit Deref coercions in The Rust Book.
Rc and reference cycles
Use Rc when it's not possible to determine at compile-time which part of the program will finish using the data last.
Reference Cycles Can Leak Memory:
- Rust’s memory safety guarantees make it difficult, but not impossible, to accidentally create a memory that is never cleaned up
- Creating reference cycles is not easily done, but it’s not impossible either.
Concurrency
Reading from a receiver with for in
loop
I typically used receiver.recv()
to read message from a receiver.
But std::sync::mpsc::Receiver
implements IntoIterator
, meaning that one can use for in
loop what is
much more handy:
let = channel;
for message in receiver
Mutex and interior mutability
I haven't thought of Mutex in terms of interior mutability, but
Mutex<T>
provides interior mutability, as the Cell
family does.
Object safety and traits
Object safety is required for Trait Objects.
You can only make object-safe traits into trait objects. Some complex rules govern all the properties that make a trait object-safe, but in practice, only two rules are relevant. A trait is object-safe if all the methods defined in the trait have the following properties:
- The return type isn't
Self
- There are no generic type parameters.
For example, it's not allowed to have Box<dyn Clone>
because Clone::clone()
returns Self
and therefore is not object-safe.
let clonable: = Box new;
Compilation error:
error[E0038]: the trait `Clone` cannot be made into an object
--> src/main.rs:4:19
|
4 | let clonable: Box<dyn Clone> = Box::new(555i32);
| ^^^^^^^^^^^^^^ `Clone` cannot be made into an object
|
= note: the trait cannot be made into an object because it requires `Self: Sized`
= note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically;
You'll find more about Object Safety in the Rust Reference.
Patterns and matching
Pattern matching in Rust is very powerful, and I have realized that usually, I use only about a half of its capabilities.
Ignoring values in a destruction
..
is used to ignore values we're not interested in:
let = ;
let Person = Person ;
Multiple match patterns, ranges, guards, and bindings
|
is used to match multiple alternatives- Ranges can be used as a pattern
if
defines an extra match guard@
is used to bind variable to perform an extra test
Example:
let x = 16;
match x
Refutability
There 2 kinds of patterns: refutable and irrefutable.
Irrefutable patterns
Patterns that match any possible value passed are irrefutable.
Example:
let x = 7;
There is nothing that can go wrong with that pattern.
Refutable patterns
Patterns that can fail to match for some possible value are refutable.
Example:
if let Some = option
If option
was None
pattern above would not match.
Function parameters, let
statements, and for
loops can only accept irrefutable patterns, because the program cannot do anything meaningful when values don't match.
Unsafe
I have to be honest: in 4 years as I use Rust for my side projects, I never felt a need to use
unsafe
. However, it's good to have a shallow understanding of it.
Unsafe superpowers
- Dereference a raw pointer
- Call an unsafe function or method
- Access or modify a mutable static variable
- Implement an unsafe trait
- Access fields of unions
Raw pointers
Unsafe Rust has two new types called raw pointers: *const T
, *mut T
.
Different from references and smart pointers, raw pointers:
- Are allowed to ignore the borrowing rules by having both immutable and mutable pointers or multiple mutable pointers to the same location
- Aren't guaranteed to point to valid memory
- Are allowed to be null
- Don't implement any automatic cleanup
Const VS immutable static var
Constants and immutable static variables might seem similar, but a subtle difference is that values in a static variable have a fixed address in memory. Using the value will always access the same data. Constants, on the other hand, are allowed to duplicate their data whenever they're used
Unions
Usage of unions requires unsafe
.
However, the only valid use case of unions is compatibility with C.
Advanced Types
Thunk
Thunk is just a new term that I haven't heard before. From Wikipedia:
In computer programming, a thunk is a subroutine used to inject an additional calculation into another subroutine. Thunks are primarily used to delay a calculation until its result is needed, or to insert operations at the beginning or end of the other subroutine.
Never type
Rust has never type !
(in some other languages known as empty type). It can be used in functions, that never return a value. E.g. in an endless loop:
!
Dynamically Sized Types
Dynamically sized types (DST) are types whose size is known only at runtime and is not known at compile time.
For example, str
(not &str
) is DST because the size of a string can not be known at compile time.
The Same applies to traits: every particular trait is DST.
We could try to implement function like where argument implements Debug
trait:
But it will not compile. Rust tells us explicitly that size of Debug
is not known at compile time:
1 | fn debug(arg: dyn std::fmt::Debug) {
| ^^^ doesn't have a size known at compile-time
|
= help: the trait `Sized` is not implemented for `(dyn Debug + 'static)`
And in the error message the compiler mentions Sized
trait.
Sized
trait is used to determine whether or not a particular type's size is known at compile time.
In fact, whenever there is a generic function like this one:
Rust sees it as:
If the Sized
restriction needs to be relaxed, a developer must explicitly use ?Sized
.
Let's say we want to have a function generic over T
, but as an argument, we are passing reference instead of actual value.
Because the size of a reference is always known at compile time, T: Sized
restriction is not wanted:
Sized>
This way generic
function can be generic over str
.
Advanced Functions and Closures
Fn
(trait) and fn
(function pointer) are different things.
Generally prefer using function interfaces with traits Fn
, FnMut
, FnOnce
instead of fn
type,
because traits give more flexibility.
Macros
Briefly macros can be divided into the following categories:
- declarative (
macro_rules!
) - procedural
- custom
#[derive]
- attribute-like macros, e.g.
#[route(GET, "/")]
- function-like macros
- custom
Function-like macros was a discovery for me. In terms of usages it's very similar to macro_rules!
, but it allows to implement parsers for a completely custom syntax.
I think function-like macros must be a very good fit for DSL.
For example, this can be a totally valid rust code:
deutsch!;
Raw identifies
Raw identifiers are the syntax that lets you use keywords where they wouldn’t normally be allowed. You use a raw identifier by prefixing a keyword with r#.
For example, normally it's not possible to define function match()
because match
is a keyword used for pattern matching.
However, with raw identifies one can work around it:
#match
Summary
With this article, I just wanted to polish my Rust knowledge. However, if you have discovered something new, I am glad. The article itself is a derivate of The Rust Book which I encourage you to (re)read.