Property-Based Testing in Rust with Arbitrary
Serhii Potapov October 21, 2022 #rust #testing #arbitrary #fuzzingIn my opinion, arbitrary is one of the most underrated Rust crates.
What is so special about it?
It provides the Arbitrary trait.
If a type implements Arbitrary
, then a value of that type can be obtained from Unstructured
, which is essentially just a sequence of bytes.
This is a foundation for fuzz testing in the Rust ecosystem. Whilst fuzzing is not a common practice for every project,
Arbitrary
can be also used for property-based testing.
Property-based testing VS fuzzing
Property-based testing and fuzzing have the same idea behind them: generate random structured input and feed it to software aiming to make it break or behave incorrectly. However, fuzzing is generally a black-box testing method and can last for very long (e.g. months), while property-based tests rather belong to a unit test suite and may run for a fraction of second.
I find that property-based testing in some situations can be a very good replacement for classical unit tests, when we need to test symmetrical data conversion, like:
- Serialization and deserializaiton
- Converting between domain models and DTOs
- Converting between domain models and persistence layer (database records).
Introduction to the problem
Let's say we have a domain model Vehicle
defined as the following:
;
A vehicle can be either a car or a bicycle; a car has fuel associated with it and an optional maximum speed in kilometers per hour.
If we want to store a such model in a relational database like PostgreSQL, we'd need to convert it into a flat structure, that resembles a record in the database:
And we'll need the functions for bidirectional conversion between Vehicle
and VehicleRecord
.
Finally, let's implement a unit test to verify conversion from Vehicle
to VehicleRecord
and vice versa. If everything works correctly, we should
get the same model back:
Even though the test is correct, it's not exhaustive and the following cases are missing:
fuel
isFuel::Diesel
max_speed_kph
is presentvehicle_type
isVehicle::Bicycle
Shall we copy-paste write 2-3 tests more? Or just pretend it's sufficient?
Welcome Arbitrary
Let's introduce arbitrary to the project:
cargo add arbitrary --features=derive
We also want to derive Arbitrary
trait for our models:
use Arbitrary;
// And so for VehicleId, VehicleType and Fuel
Now we can play with it a little to get a better feeling how it works:
Output:
&vehicle = Vehicle {
id: VehicleId(
414394623,
),
vehicle_type: Car {
fuel: Electricity,
max_speed_kph: Some(
3870351,
),
},
}
Is it not wonder? Out of nothing random bytes we have obtained a structured vehicle!
If we can keep those bytes coming, we can generate an endless amount of vehicles to feed the unit test.
Rewriting the test with Arbitrary and arbtest
If we want to use Arbitrary
for property-based testing, a tiny library like arbtest comes handy.
It's a bit raw but it gets the job done. Let's rewrite our test:
It's similar to the old test, but with a few innovations:
the assertions are now within prop
function.
The prop
function receives Unstructured
, which is used to obtain an arbitrary vehicle.
The responsibility of the builder is to generate Unstructured
and feed it to prop()
.
In fact, prop
is invoked multiple times within a single test and by default, the arbtest runs it for 200ms.
OK, let's run the tests and see if it works:
failures:
---- test_vehicle_record_mapping stdout ----
thread 'test_vehicle_record_mapping' panicked at 'called `Result::unwrap()` on an `Err` value: TryFromIntError(())', src/main.rs:49:60
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
arb_test failed!
Seed: 0x25dc50a20000003e
Oh no! Our bullet-proof code panics on line 49 within vehicle_to_record()
:
max_speed_kph.map
arbtest
also printed the seed 0x25dc50a20000003e
. With this seed, we can deterministically reproduce the failure
and fix the test.
Let's slightly tweak our test:
We added dbg!(&vehicle);
to see which vehicle exactly causes problems and the builder is now initialized with .seed(0x25dc50a20000003e)
, that allows us
to reproduce the same failure. When we run it again, we can also see the vehicle details:
&vehicle = Vehicle {
id: VehicleId(
1455468422,
),
vehicle_type: Car {
fuel: Diesel,
max_speed_kph: Some(
2207965846,
),
},
}
Having all that information, the failure becomes now very plain to explain:
we use u32
for max_speed_kph
in the domain model, but i32
in the DB record.
And 2207965846
is above the maximum i32
value (2^31-1).
So how do we fix it? We go to the product owner and ask a weird question.
"Hey! Do we plan to expand our system to support tracking of spaceships with speeds over 65535 km/h?"
"Oh God, no."
"Is there any possibility that we will have to support any other vehicles with speeds over 65535 km/h?" - we continue.
"Of course not! Our business is about the city's transportation network, which implies..."
"I know, I know, thank you. I just wanted to double-check."
So that means we can replace u32
with u16
and we're fine:
We run the tests and they pass. To be extra sure we can set the time budget to 1 minute and run it locally once:
.budget_ms.run;
builder
Still no failure, so we're good to go.
Conclusions
As it was shown in this article, property-based testing can be very handy for testing symmetrical data conversions. It has number of advantages over classical unit tests:
- Much less test code needs to be written. Especially it becomes a noticeable win, when you deal with much larger data structures.
- The test inputs are not biased, what is never the case when test inputs created by a human beings (aka developers).
P.S.
In real production application, we'd rather return errors, instead of just panicking in the conversions.
Someone may prefer implementing From
traits instead of vehicle_to_record()
and record_to_vehicle()
functions. It's rather a question taste, but there 2 main reasons I did not go with From
:
- I don't want
From
to panic - I don't want the conversions to be exposed outside of the infrastructure layer.
Instead of implementing fuel_from_str
and fuel_to_str
, we could derive FromStr
and Dislpay
, using strum crate.
If you're confused about why we need both Vehicle
and VehicleRecord
,
I'd recommend reading Domain Modeling Made Functional by Scott Wlaschin.
This is out of scope for this article, but long story short: we want to keep the domain model Vehicle
as expressive as possible,
so it's easy to work with it in the domain layer.
Whilst VehicleRecord
is easy to persist in a relational database.