Builder with typestate in Rust
Serhii Potapov October 25, 2021 #rust #patternsThe problem
In the previous article I've covered the builder pattern.
Here is the
code snippet,
that implements UserBuilder
for User
structure:
Which is expected to be used in this way:
let greyblake = builder
.first_name
.build;
Notice, that id
and email
fields are mandatory and do not have any defaults,
so they're forced to be passed to User::builder()
function. Unfortunately, this
breaks the elegance of builder, because names of mandatory fields are not explicitly bound to their values
and it is easy to screw up by passing arguments in the wrong order if they're of the same type, e.g:
User::builder("example@example.com", "13")
Would it not be awesome if we could set all values in the same fashion?
let greyblake = builder
.id
.email
.first_name
.build;
But at the same time, we'd like to keep the API type-safe, so in case if the builder is misused to construct an
invalid user we want to see a compile error.
For example, the following usage should not be allowed, because the id
field is missing:
let greyblake = builder
.email
.first_name
.build;
Can we do something like that? Yes! 🦄
NOTE: The problem can be also solved with newtypes, and generally using newtypes is a very good idea, but today we'll stay focused on the builder.
Naive approach
If we think about it for a while, our builder can be in one of the following 4 states:
- new (id and email are missing)
- id is set
- email is set
- complete (id and email are set)
We can introduce 4 builder types to represent each builder state respectively:
UserBuilderNew
UserBuilderWithEmail
UserBuilderWithId
UserBuilderComplete
This would be some sort of state machine with the following flow:
Turn this into code (see in playground) and it works as intended. The following snippet compiles:
let greyblake = builder
.id
.email
.first_name
.build;
While this one fails:
let greyblake = builder
.email // <-- id is not specified
.first_name
.build;
Error message:
.build();
^^^^^ method not found in `UserBuilderWithEmail`
UserBuilderWithEmail
should be turned into UserBuilderComplete
first by setting id with .id()
and only after that a user can be built.
Although it works, this approach is not very good.
First, there is a lot of boilerplate and duplication: first_name()
and last_name()
had to be implemented 4 times
for every single variant of the builder. Second, it does not scale: the boilerplate will grow exponentially if we decide to
add new mandatory fields.
Generic builder
To eliminate the duplication we're going to make the builder generic. In particular, we're going to use a technique called typestate. Let me quote Cliffle here:
Typestates are a technique for moving properties of state (the dynamic information a program is processing) into the type level (the static world that the compiler can check ahead-of-time).
The special case of typestates that interests us here is the way they can enforce run-time order of operations at compile-time.
That's it. We want id()
and email()
to be called before build()
can be called.
Let's redefine our builder to be generic over I
and E
.
I
and E
are type placeholders that will represent the state of id
and email
fields accordingly.
Field id
can either be set as a string or be missing. The same applies to email
. Let's define simple types to reflect this:
// types for `id`
;
;
// types for `email`
;
;
So actually, what we want to do is to define a similar state machine as before, but now using generics:
When User::builder()
is called, neither id
nor email
is provided yet, so a value of type UserBuilder<NoId, NoEmail>
should be returned.
When .id()
is invoked, regardless of what email
is, we just set the id
field and preserve the email's type and value without any changes:
// +-------- Pay attention ----------+
// | |
// v |
Thanks to generics this implementation enables 2 potential transitions:
UserBuilder<NoId, NoEmail>
->UserBuilder<Id, NoEmail>
UserBuilder<NoId, Email>
->UserBuilder<Id, Email>
Symmetrically we define .email()
:
We also have to define .first_name()
and .last_name()
for all 4 possible variants,
so we just use generics:
Finally, what remains is to define .build()
, and of course, we want to have it only for type UserBuilder<Id, Email>
,
when both id and email are set:
Let's test it out. The following snippet compiles as expected:
let greyblake = builder
.id
.email
.first_name
.build;
While this one does not:
let greyblake = builder
.id // <-- email is missing
.first_name
.build;
Error:
15 | struct UserBuilder<I, E> {
| ------------------------ method `build` not found for this
...
93 | .build();
| ^^^^^ method not found in `UserBuilder<Id, NoEmail>`
|
= note: the method was found for
- `UserBuilder<Id, Email>`
Assuming that descriptive type names (NoEmail
and Email
) are used, the produced error message
must be sufficient to understand the error cause and help to figure out how to fix it
(for this example an email value needs to be set).
See the complete code in the Rust playground.
Now we are done. And yes, this approach is also not without cons:
- Adding new mandatory fields will require extending the generic builder type all over the place.
- Error messages can be obscure if bad naming is used.
So make your trade-offs wisely.
Typed builder
In practice you probably should consider using typed-builder crate instead of crafting the builder manually:
use TypedBuilder;
Thanks to sasik520 for pointing to this crate on reddit.
Summary
In this article, it was shown how typestate can be used together with builder pattern to enforce correct usage of the second one.
Some of the common use cases for typestate:
- Enforce order of function calls
- Forbid a function to be called twice
- Mutually exclusive function calls
- Require a function to be always called
Links
The inspiration for this article was taken from the following resources which I recommend you to check out:
- Phantom Builder Pattern (Elm Radio)
- The Typestate Pattern in Rust by Cliffle
- Type-Driven API Design in Rust by Will Crichton
P.S.
Originally I was planning to write an article about Phantom Builder pattern. In essence, phantom builder is just a very specific case of builder and typestate when state field is never used at runtime (hence the name).