TL;DR: Err
is a "type-name" and not actually a classic OOP-style constructor, meaning there is no consistent breakpoint to target unless artificially injected through a (currently non-existent, hypothetical) compiler-specific option.
I'll try give this one a shot.
I believe you want to accomplish is incompatible with how the (current) "one true Rust implementation" is currently constructed and its take on "enum constructors" without some serious hacks -- and I'll give my best inference about why (as of the time of writing -- 22 Sep 2022
), and give you some ideas and options.
Breaking it down: finding definitions
"What happens when you "construct" an enum
, anyways...?"
As Rust does not have a formal language standard or specification document, its "semantics" are not particularly precisely defined, so there is no "legal" text to really provide the "Word of God" or final authority on this topic.
So instead, let's refer to community materials and some code:
There is exactly one way to create an instance of a user-defined type: name it, and initialize all its fields at once:
...
That's it. Every other way you make an instance of a type is just calling a totally vanilla function that does some stuff and eventually bottoms out to The One True Constructor.
Unlike C++, Rust does not come with a slew of built-in kinds of constructor. There are no Copy, Default, Assignment, Move, or whatever constructors. The reasons for this are varied, but it largely boils down to Rust's philosophy of being explicit.
Move constructors are meaningless in Rust because we don't enable types to "care" about their location in memory. Every type must be ready for it to be blindly memcopied to somewhere else in memory. This means pure on-the-stack-but-still-movable intrusive linked lists are simply not happening in Rust (safely).
In comparison to C++'s better-specified semantics for both enum class constructors and std::Variant<T...>
(its closest analogue to Rust enum
), Rust does not really say anything about "enum constructors" in-specific except that it's just part of "The One True Constructor."
The One True Constructor is not really a well-specified Rust concept. It's not really commonly used in any of its references or books, and it's not a general programming language theory concept (at least, by that exact name -- it's most-likely referring to type constructors, which we'll get to) -- but you can eke out its meaning by reading more and comparison to the programming languages that Rust takes direct inspiration from.
In fact, where C++ might have move, copy, placement new
and other types of constructors, Rust simply has a sort of universal "dumb value constructor" for all values (like struct
and enum
) that does not have special operational semantics besides something like "create the value, wherever it might be stored in memory".
But that's not very precise at all. What if we try to look at the definition of an enum?
...
We attach data to each variant of the enum directly, so there is no need for an extra struct. Here it’s also easier to see another detail of how enums work: the name of each enum variant that we define also becomes a function that constructs an instance of the enum. That is, IpAddr::V4() is a function call that takes a String argument and returns an instance of the IpAddr type. We automatically get this constructor function defined as a result of defining the enum.
Aha! They dropped the words "constructor function" -- so it's pretty much something like a fn(T, ...) -> U
or something? So is it some sort of function? Well, as a generally introductory text to Rust, The Rust Programming Language book can be thought as less "technical" and "precise" than The Rust Reference:
An enumerated type is a nominal, heterogeneous disjoint union type, denoted by the name of an enum item. ^1 ...
...
Enum types cannot be denoted structurally as types, but must be denoted by named reference to an enum item.
...
Most of this is pretty standard -- most modern programming languages have "nomimal types" (the type identifier matters for type comparison) -- but the footnote here is the interesting part:
The enum type is analogous to a data constructor declaration in ML, or a pick ADT in Limbo.
This is a good lead! Rust is known for taking a large amount of inspiration from functional programming languages, which are much closer to the mathematical foundations of programming languages.
- ML is a whole family of functional programming languages (e.g. OCaml, Standard ML, F#, and sometimes Haskell) and is considered one of the important defining language-families within the functional programming language space.
- Limbo is an older concurrent programming language with support for abstract data types, of which
enum
is one of.
Both are strongly-rooted in the functional programming language space.
Summary: Rust enum
in Functional Programming / Programming Language Theory
For brevity, I'll omit quotes and give a summary of the formal programming language theory behind Rust enum
's.
Rust enum
's are theoretically known as "tagged unions" or "sum types" or "variants".
Functional programming and mathematical type theory place a strong emphasis on modeling computation as basically "changes in typed-value structure" versus "changes in data state".
So, in object-oriented programming where "everything is an [interactable] object" that then send messages or interact with each other...
- -- in functional programming, "everything is a pure [non-mutative] value" that is then "transformed" without side effects by "mathematically-pure functions" .
So functional/mathematical type constructors are not intended to "execute" or have any other behavior. They are simply there to "purely construct the structure of pure data."
Conclusion: "Rust doesn't want you to inject a breakpoint into data"
Per Rust's theoretical roots and inspiring influences, Rust enum
type constructors are meant to be functional and only to wrap and create type-tagged data.
In other words, Rust doesn't really want to allow you to "inject" arbitrary logic into type constructors (unlike C++, which has a whole slew of semantics regarding side effects in constructors, such as throwing exceptions, etc.).
They want to make injecting a breakpoint into Err(T)
sort of like injecting a breakpoint into 1
or as i32
. Err(T)
is more of a "data primitive" rather than a "transforming function/computation" like if you were to call foo(123)
.
In Code: why it's probably hard to inject a breakpoint in Err()
.
Let's start by looking at the definition of Err(T)
itself.
The Definition of std::result::Result::Err()
Here's is where you can find the definition of Err()
directly from rust-lang/rust/library/core/src/result.rs @ v1.63.0
on GitHub:
// `Result` is a type that represents either success ([`Ok`]) or failure ([`Err`]).
///
/// See the [module documentation](self) for details.
#[derive(Copy, PartialEq, PartialOrd, Eq, Ord, Debug, Hash)]
#[must_use = "this `Result` may be an `Err` variant, which should be handled"]
#[rustc_diagnostic_item = "Result"]
#[stable(feature = "rust1", since = "1.0.0")]
pub enum Result<T, E> {
/// Contains the success value
#[lang = "Ok"]
#[stable(feature = "rust1", since = "1.0.0")]
Ok(#[stable(feature = "rust1", since = "1.0.0")] T),
/// Contains the error value
#[lang = "Err"]
#[stable(feature = "rust1", since = "1.0.0")]
Err(#[stable(feature = "rust1", since = "1.0.0")] E),
}
Err()
is just a sub-case of the greater enum std::result::Result<T, E>
-- and this means that Err()
is not a function, but more of like a "data tagging constructor".
Err(T)
in assembly is meant to be optimized out completely
Let's use Godbolt to breakdown usage of std::result::Result::<T, E>::Err(E)
: https://rust.godbolt.org/z/oocqGj5cd
// Type your code here, or load an example.
pub fn swap_err_ok(r: Result<i32, i32>) -> Result<i32, i32> {
let swapped = match r {
Ok(i) => Err(i),
Err(e) => Ok(e),
};
return swapped;
}
example::swap_err_ok:
sub rsp, 16
mov dword ptr [rsp], edi
mov dword ptr [rsp + 4], esi
mov eax, dword ptr [rsp]
test rax, rax
je .LBB0_2
jmp .LBB0_5
.LBB0_5:
jmp .LBB0_3
ud2
.LBB0_2:
mov eax, dword ptr [rsp + 4]
mov dword ptr [rsp + 12], eax
mov dword ptr [rsp + 8], 1
jmp .LBB0_4
.LBB0_3:
mov eax, dword ptr [rsp + 4]
mov dword ptr [rsp + 12], eax
mov dword ptr [rsp + 8], 0
.LBB0_4:
mov eax, dword ptr [rsp + 8]
mov edx, dword ptr [rsp + 12]
add rsp, 16
ret
Here is the (unoptimized) assembly code that corresponds to the line Ok(i) => Err(i),
that constructs the Err:
mov dword ptr [rsp + 12], eax
mov dword ptr [rsp + 8], 1
and Err(e)
is basically optimized out if you optimized with -C optlevel=3
:
example::swap_err_ok:
mov edx, esi
xor eax, eax
test edi, edi
sete al
ret
Unlike in C++, where C++ leaves room to allow for injection of arbitrary logic in constructors and to even to represent actions like locking a mutex, Rust discourages this in the name of optimization.
Rust is designed to discourage inserting computation in type constructor calls -- and, in fact, if there is no computation associated with a constructor, it should have no operational value or action at the machine-instruction level.
Is there any way this is possible?
If you're still here, you really want a way to do this even though it goes against Rust's philosophy.
"...And besides, how hard can it be? If gcc
and MSVC can instrument ALL functions with tracing at the compiler-level, can't rustc
do the same?..."
I answered a related StackOverflow question like this in the past: How to build a graph of specific function calls?
In general, you have 2 strategies:
- Instrument your application with some sort of logging/tracing framework, and then try to replicate some sort of tracing mixin-like functionality to apply global/local tracing depending on which parts of code you apply the mixins.
- Recompile your code with some sort of tracing instrumentation feature enabled for your compiler or runtime, and then use the associated tracing compiler/runtime-specific tools/frameworks to transform/sift through the data.
For 1, this will require you to manually insert more code or something like _penter/_pexit for MSVC manually or create some sort of ScopedLogger that would (hopefully!) log async to some external file/stream/process. This is not necessarily a bad thing, as having a separate process control the trace tracking would probably be better in the case where the traced process crashes. Regardless, you'd probably have to refactor your code since C++ does not have great first-class support for metaprogramming to refactor/instrument code at a module/global level. However, this is not an uncommon pattern anyways for larger applications; for example, AWS X-Ray is an example of a commercial tracing service (though, typically, I believe it fits the use case of tracing network calls and RPC calls rather than in-process function calls).
For 2, you can try something like utrace or something compiler-specific: MSVC has various tools like Performance Explorer, LLVM has XRay, GCC has gprof. You essentially compile in a sort of "debug++" mode or there is some special OS/hardware/compiler magic to automatically insert tracing instructions or markers that help the runtime trace your desired code. These tracing-enabled programs/runtimes typically emit to some sort of unique tracing format that must then be read by a unique tracing format reader.
However, because Err(T)
is a [data]type constructor and not really a first-class fn
, this means that Err(T)
will most likely NOT be instrumented like a usual fn
call. Usually compilers with some sort of "instrumentation mode" will only inject "instrumentation code" at function-call boundaries, but not at data-creation points generically.
What about replacing std::
with an instrumented version such that I can instrument std::result::Result<T, E>
itself? Can't I just link-in something?
Well, Err(T)
simply does not represent any logical computation except the creation of a value, and so, there is no fn
or function pointer to really replace or switch-out by replacing the standard library. It's not really part of the surface language-level interface of Rust to do something like this.
So now what?
If you really specifically need this, you would want a custom compiler flag or mode to inject custom instrumentation code every-time you construct an Err(T)
data type -- and you would have to rebuild every piece of Rust code where you want it.
Some Possible Options
Do a text-replace or macro-replacement to turn every usage of /Err(.*)/
in your application code that you want to instrument into your own macro or fn
call (to represent computation in the way Rust wants), and inject your own type of instrumentation (probably using either log
or tracing
crates).
Find or ask for a custom instrumentation flag on rustc
that can generate specific assembly/machine-code to instrument per every usage of Err(T)
.