Rust functions are surprisingly diverse, sitting at the intersection of multiple language features which may take time to understand. In this post, we’ll walk through those features and explain how they appear in function signatures, so you can be well-equipped to understand functions you see in the wild, or identify the best way to write the functions you need in your own code.
Table of Contents
Preface
Info 1 Before We Begin
Skip this content.Describing, not Recommending
This is a survey of what function signatures can look like in Rust, not a commentary on what they should look like. Any one of the patterns shown here may be seen in the wild, and learning to read other people’s code in any language is a valuable skill.
Part of a Series
This is the first of a pair of posts describing how to read Rust function signatures. Part 2, tackling generic functions, is currently in the works.
Signatures
First things first, we need a function signature. Let’s start with something basic.
This is a function that takes in two 32-bit integers (the i32
type), and
returns a 32-bit integer as well. The arrow (->
) indicates the return type.
Note that the types are all explicit. Rust does support type inference, but not for function signatures. All parameter and return types must be specified.
We can also see that the pattern comes before the type, separated by a colon. This
<pattern>: <type>
syntax matches Rust’s general syntax for
type ascription, which is the Rust term for
specifying types explicitly.
Destructuring
Notice too that I keep saying “pattern” instead of “variable name.” That’s because Rust parameters are patterns, meaning you can destructure and bind against the internal structure of the types.
In this example, the IpV4Address
type is destructured in the function signature, binding four
variables (o1
, o2
, o3
, and o4
) to the respective items in the tuple struct. These
variables are then used in the function body to print the address.
Refutability
One restriction of pattern matching in function signatures is that you can only use
irrefutable patterns, meaning patterns that always match.
By contrast, refutable patterns may sometimes fail to match,
perhaps because they specify only a single variant of an enum
with multiple variants.
Refutable patterns can be used in a match
expression or equivalent construct, where the
collection of patterns is checked for exhaustiveness (meaning all values are guaranteed to
match at least one pattern), but patterns in function signatures are all alone, so they must
be irrefutable.
Here’s a table to help explain:
A survey of pattern refutability.
↺Pattern | Refutability |
---|---|
a |
Irrefutable |
_ |
Irrefutable |
MyStruct { f1: x, f2: y } |
Irrefutable |
Some(b) |
Refutable |
Err(MyError::Foo) |
Refutable |
The first three patterns could be found in a function signature. The last two could not.
Renaming
Another feature of declaring and assigning to new variables in a function signature pattern is that you can create a variable with a name different from the name of the field in the relevant type.
This can be useful when you want to locally use a different name than the name of the field, and do so in a single line rather than binding to a local variable with a new name in the body of the function.
Bindings
As we use patterns to bind, we can also set the kind of binding, which may be the
default (with no specifier) mut
, ref
, or ref mut
. Collectively, the binding
options are as follows:
Binding syntax and the resulting types
↺Binding Specifier | Mutability | Ownership |
---|---|---|
No specifier | Immutable | Owned |
mut |
Mutable | Owned |
ref |
Immutable | Borrowed |
ref mut |
Mutable | Borrowed |
The binding pattern specifies how a binding should occur; should it be
by-value (meaning in Rust that it either takes ownership of the bound value, or makes a copy of it,
depending on whether the type of that value implements the Copy
trait), by-reference, or
by-mutable-reference?
Owning Binding
An owning binding may display one of two behaviors, depending on the type involved in the binding.
If the type implements the Copy
trait, then it will be copied to the new owner at the binding site.
If the type does not implement Copy
, then it will be moved from the prior owner to the new owner
at the binding site.
To understand this, let’s talk about Copy
. Copy
is a trait indicating a type is “trivially copyable,”
meaning it can be copied with only a call to memcpy
, so all the data contained in the structure
is contiguous; there are no pointers to chase. Copy
tells us that copying a piece of data is fast.
At the same time, it improves ergonomics for certain types which may otherwise be tedious to use under
Rust’s ownership semantics. Imagine if number types (which all implement Copy
) were moved any time
they were assigned. Something as simple as x = y
would invalidate y
, and thus make mathematical code
much more frustrating to write.
So, Copy
pulls double-duty. It tells us something is cheap to copy, and it permits that copying to be
done implicitly.
In contexts where a type doesn’t implement Copy
but does implement the Clone
trait, you can instead
call .clone()
on it explicitly to create a duplicate which will be moved into the new owner, without
invalidating the prior owner.
Now, what does this look like in the context of a binding?
Sometimes this kind of binding is exactly what you want. For example, you may have a need for a “consuming builder,” one of two forms the Builder Pattern can take in Rust. In a consuming builder, the builder type passes ownership of some data to the type that it’s building, because that type will need ownership of the data to operate.
If you take ownership of a piece of data with an owning binding and want to return ownership to the
calling context, you can return it from the function. For example, the popular once_cell
crate features
a type which can only be written to once. If you try to write to it again, it returns ownership of the
value you attempted to set.
Mutable Owning Binding
Sometimes, in addition to taking ownership of a piece of data, you’d like for that data to be mutable from the start as well. In that case, you can use a mutable owning binding.
The use of a mutable owning binding can always be replaced with an immutable owning binding followed by a mutable rebinding to a variable of the same name in the body of the function (shadowing the parameter from that line onward). The choice is one of taste.
Reference Binding
Reference bindings inside function signatures in Rust can seem a little unusual, but they are permitted.
The idea is that the binding performed is a reference to the type of the value. If the value was passed
in by value, then it’s either moved or copied as discussed in the owning binding section, and in the body
of the function the value is of a reference to the post-move data (if the type is Copy
, the difference
doesn’t amount to much). This is different from an owning binding of a reference type both for the caller
of the function, and inside the function itself (x: &Number
is not the same as ref x: Number
).
Reference bindings are more useful in the presence of a reference type, along with destructuring. In that case, they permit convenient access to bind-by-reference the internal fields of a type which has been passed by reference.
Mutable Reference Binding
Mutable reference bindings are similar to the above examples for immutable reference bindings, except they’re mutable.
Same as the other reference bindings, they may be considered surprising when used in the presence of a type passed by-value. When working instead with a type passed by reference, there is one additional thing to consider: you can’t get a mutable reference out of a value passed by immutable reference.
Binding vs. Type
Note as well that these bindings are relative to the type on the right hand side of the ascriptive clause. To explain, let’s see some examples, annotated with the resulting types.
Associated Functions
Associated functions are functions which are “associated” with a type, meaning they live under the namespace of that type. Otherwise, they behave like normal functions.
Constructors
Constructors, which usually return the associated type (called Self
, with an uppercase “S”) or some
wrapper of it (like Result<Self, SomeErrorType>
or Option<Self>
), are usually written as associated
functions. It would be perfectly valid, for a typo Foo
, to write a constructor as a free function
(meaning not associated with the type):
However, doing this isn’t ideal Rust style. Instead, you’d use an associated function, like so:
Note that Foo::new
has access to the Self
type (which is most convenient for complex Self
types),
and is called as a path starting at the name of the type.
Deref Collision & Smart Pointers
Another context where associated functions are commonly written is for smart pointers, which are types which wrap another type while still being
usable as if they were the original type. The most common smart pointer types in Rust are Box
, Rc
,
and Arc
, and they all rely on a special trait called Deref
. Deref
enables a feature in Rust called
deref coercion, which is used whenever a method call is made. Rust,
at compile time, checks if the method is defined on the type it’s being called with, and on whatever
type may be returned by that type’s Deref
or DerefMut
implementations (depending on the mutability
of self
in the method being checked), doing so for however many layers of deref-ing are available.
This is what makes smart pointers easy to use in place of the original type!
However, because of deref coercion, defining methods on the smart pointer may make it difficult to call
any methods on the contained type which have the same name. To avoid this collision, methods on smart
pointer types are often defined as associated functions instead. The Rc
type has multiple examples of this,
with functions like Rc::strong_count
(which returns the number of strong pointers to
the underlying data currently live), being defined as associated functions.
Methods
Next, let’s look at methods in Rust. Methods are functions which are attached to a type, meaning they take a parameter called self
. These are distinct from
associated functions syntactically by the presence of the “receiver.”
The receiver is self
, and represents the specific datum of the type
on which the method is being called. The receiver can have a number of possible types, three of which
come which special shorthand syntax because they are the most common options.
List of receiver types and syntactic sugar for them.
↺Receiver | Type | Shorthand |
---|---|---|
self: Self |
Owning | self |
self: &Self |
Reference | &self |
self: &mut Self |
Mutable reference | &mut self |
self: Box<Self> |
Owning pointer | None |
self: Rc<Self> |
Reference counted pointer | None |
self: Arc<Self> |
Thread-safe reference counted pointer | None |
self: Pin<&mut Self> |
Pinned mutable reference | None |
… | Nested combinations of any of the above | None |
Each of these has their own distinct meaning, and it’s worthwhile to discuss when and why you’d use each of them.
Owning Receiver
Taking ownership of self
means that, unless you pass ownership out to a new owner, the self
object will be dropped at the end of the function, as its owner has gone out of scope. If the
type implement the Drop
trait, its Drop::drop
implementation will be run to perform any
deallocation or cleanup work necessary. Taking self
by value is commonly used in situations
like the Builder pattern, where you want to consume the builder and return whatever object
it’s designed to build.
Reference Receiver
Taking self
by reference means that self
will be borrowed for the duration of the function call.
Rust’s rules disallow simultaneous mutable and immutable borrows, so if a function takes self
by
reference, the caller will be unable to mutate the object until the function call ends.
Mutable Reference Receiver
Taking self by mutable reference means that self
will be mutably borrowed for the duration of the
function call. As always, Rust’s “aliasing XOR mutability” rule is in play.
Info 2 Disjoint Borrows and Partial Moves in Method Calls
Skip this content.Whenever two or more pieces of data from a single struct are borrowed at the same time, Rust performs an analysis to see if the two borrows are disjoint borrows. For example, it is perfectly fine to immutably borrow one field of a struct, and mutably borrow another field of the same struct at the same time, as those two fields are different.
This analysis can be stymied by hiding a borrow behind a method. If one or more of the borrows happens
within a method of the outer type, then the borrows are no longer seen as disjoint, because the method
on the outer type would take self
by some sort of reference. From the perspective of the borrow checker,
with the introduction of a method call, all of self
is now borrowed at the same time as one of its
fields is borrowed, and unlike the original case, these borrows are not disjoint, and do not pass
borrow checking.
The same problem arises with partial moves, where a field of a type
is moved, but not the whole type. If a partial move is relocated to a method that takes ownership of
self
, then the move is no longer partial in the calling context, which may cause a compilation
error. Dr. David Pearce has a more in-depth guide to partial moves which explains them nicely.
The remaining receiver types are less common, but no less important.
Owning Pointer Receiver
First, Box<Self>
indicates that you’re taking ownership of a pointer to self
. Most of the time
this isn’t necessary, but one particular use case arises when working with unsized types. Rust
requires (and many CPU architectures require) function parameters to have sizes known at compile-time;
because of this, special care must be taken with the treatment of types without a known size. Slices
are one example, because they are an arbitrarily-sized view into a memory location, and trait objects
are another, because the size of the actual data is hidden when the concrete type is erased as part of
trait object construction. When implementing a method for an unsized type, taking a parameter as
self: Self
is invalid, because self: !Sized
(this is the notation indicating that self does not
implement the Sized
trait). However, taking it as Box<Self>
is valid, because the size of the
pointer is known at compile-time, and it’s now a pointer being passed instead of the underlying data.
Info 3 Why Unsized Receivers Don’t Work
Skip this content.Yandros, on the Rust User Forum, has a more thorough explanation of this subject, which I recommend reading for a deeper understanding.
As a summary: Rust generics are
monomorphized, meaning the compiler
generates individual copies of generic code for each each unique set of types it’s called with. On
most computer architectures, function calls require knowing the exact size of the parameters passed
to them, and in the case of unsized types, that size is unknown. So monomorphizing generic code
where Self
doesn’t implement Sized
doesn’t work in all cases, and is therefore rejected by the
Rust compiler. Wrapping Self
in a Box
or other pointer makes the size known (it’s the size of
a pointer type).
Reference-Counted Pointer Receivers
The two other pointer types are provided for similar reasons. Rc
and Arc
are respectively the
not-thread-safe and thread-safe versions of a reference-counted pointer, and they provide the same value
as a receiver type that Box
does, with the addition of permitting multiple pointers to exist to the
same data.
Pinned Receiver
Then there’s Pin<&mut self>
. Pin
is a type which indicates the data pointed to by the pointer inside
of it never moves in memory (unless that data implements the Unpin
trait, in which case it may be
safely moved even when inside of a Pin
). Pin<&mut self>
means that self
is pinned, and may not
move. The context you’re most likely to see this in Rust today is around the Future
trait in the
standard library, which defines a single method where the receiver type is Pin<&mut Self>
. Explaining
why futures need pinning is a more involved topic though, so I recommend reading the
Rust Async Book if you’re interested, as it’s covered in great detail and care there.
Nested Receivers
Finally, you can nest any of these receiver types as well, so self: Box<Box<Self>>
or
self: Rc<Box<Pin<&mut Self>>>
work as receiver types, although these are even less likely to
be necessary than the un-nested versions we’ve just covered.
Conclusion
This covers the basics of reading Rust functions. After reading this post, you should hopefully have a better understanding of some of the following concepts:
- That the left-hand side of each parameter defined in a function signature is an irrefutable pattern which can feature destructuring, renaming, and one of four possible bindings.
- That the right-hand sand of each parameter in a function signature is a type, which may be a reference or owning type, and that the selection of type interacts with the selection of binding to determine whether the actual parameter passed in the calling context is moved into the function, and whether the formal parameter inside the function is a reference or non-reference type.
- That associated functions may be used to put functions inside of the namespace of a particular type, signaling their association to that type, and may include constructors or (in the case of smart pointers) functions which would normally be written as methods, but are written as associated functions to avoid possible naming conflicts due to deref coercion.
- That methods may feature a number of different receiver types, which are selected based on the needs of the function and the future callers of it.
Info 4 Coming in Part 2
Skip this content.Part 2 will introduce generic functions, and cover topics including:
- Trait bounds, including complex bounds featuring multiple traits.
- Associated types, how they differ from generic types, and how they may be used in trait bounds.
- The “impl Trait” syntax, what it enables in return position, and its syntactic value in non-return position.
- Trait objects, what they are, when they can be created, and what restrictions exist around their use.
- Lifetimes, including the use of lifetime parameters and lifetime bounds.
- Subtype polymorphism, and how it appears and is used in Rust.
- Function parameter types, including function pointers, closures, and the use of Higher-Rank Trait Bounds.
I won’t set a date for when Part 2 will be done, but it is actively in the works. I also haven’t yet decided whether it will include coverage of const generics and specialization, both improvements to Rust’s type system which entail syntactic additions and could be covered in this post, but which are unfinished and unstable, and so may be premature to cover at this time.