17

When debugging a Rust program is it possible to break execution when an Err() value is created?

This would serve the same purpose as breaking on exceptions in other languages (C++, Javascript, Java, etc.) in that it shows you the actual source of the error, rather than just the place where you unwrapped it, which is not usually very useful.

I'm using LLDB but interested in answers for any debugger. The Err I am interested in is generated deep in Serde so I cannot really modify any of the code.

Timmmm
  • 88,195
  • 71
  • 364
  • 509
  • Both ObjC & C++ exception handling works by setting a breakpoint on the relevant "exception throw" call in the language runtime. Presumably there is a similar location in the Rust runtime where an error created? If so you should be able to set a by name breakpoint there. – Jim Ingham Apr 16 '20 at 17:31
  • 5
    I'm fairly certain Rust doesn't make this possible. Unlike the other languages you mention, `Err` isn't treated any special by the compiler, and it will typically inline the construction of the error value, even at `opt-level=0`, so the resulting binary essentially has no trace of the call to `Err()` that occurs in the code. Also see [this relevant issue](https://github.com/rust-lang/rust/issues/54144) in the Rust repository, seems like you're not the first person to want this. – Frxstrem Apr 16 '20 at 17:38
  • Hmm, since tuple enum constructors are functions, I would assume you could manage to breakpoint that... – Optimistic Peach Apr 16 '20 at 17:44
  • 2
    https://github.com/yaahc/eyre This let's you add context to errors. One such context is the stacktrace of where the Err was created. Not an answer to your question but it might help you anyway. – Unapiedra Apr 16 '20 at 18:08
  • 1
    If there is some way to plug in to error creation, which Unapiedra's link looks like it does, you could insert a hook just to have something to break on. If that's not possible you might see if you can get such a thing added (maybe only when unoptimized) to Rust. It's been a very useful feature for other languages. – Jim Ingham Apr 17 '20 at 17:17

2 Answers2

3

TL;DR: Err is a "type-name" and not actually a classic OOP-style constructor, meaning there is no consistent breakpoint to target unless artificially injected through a (currently non-existent, hypothetical) compiler-specific option.

I'll try give this one a shot.

I believe you want to accomplish is incompatible with how the (current) "one true Rust implementation" is currently constructed and its take on "enum constructors" without some serious hacks -- and I'll give my best inference about why (as of the time of writing -- 22 Sep 2022), and give you some ideas and options.

Breaking it down: finding definitions

"What happens when you "construct" an enum, anyways...?"

As Rust does not have a formal language standard or specification document, its "semantics" are not particularly precisely defined, so there is no "legal" text to really provide the "Word of God" or final authority on this topic.

So instead, let's refer to community materials and some code:

Constructors - The Rustonomicon

There is exactly one way to create an instance of a user-defined type: name it, and initialize all its fields at once:

...

That's it. Every other way you make an instance of a type is just calling a totally vanilla function that does some stuff and eventually bottoms out to The One True Constructor.

Unlike C++, Rust does not come with a slew of built-in kinds of constructor. There are no Copy, Default, Assignment, Move, or whatever constructors. The reasons for this are varied, but it largely boils down to Rust's philosophy of being explicit.

Move constructors are meaningless in Rust because we don't enable types to "care" about their location in memory. Every type must be ready for it to be blindly memcopied to somewhere else in memory. This means pure on-the-stack-but-still-movable intrusive linked lists are simply not happening in Rust (safely).

In comparison to C++'s better-specified semantics for both enum class constructors and std::Variant<T...> (its closest analogue to Rust enum), Rust does not really say anything about "enum constructors" in-specific except that it's just part of "The One True Constructor."

The One True Constructor is not really a well-specified Rust concept. It's not really commonly used in any of its references or books, and it's not a general programming language theory concept (at least, by that exact name -- it's most-likely referring to type constructors, which we'll get to) -- but you can eke out its meaning by reading more and comparison to the programming languages that Rust takes direct inspiration from.

In fact, where C++ might have move, copy, placement new and other types of constructors, Rust simply has a sort of universal "dumb value constructor" for all values (like struct and enum) that does not have special operational semantics besides something like "create the value, wherever it might be stored in memory".

But that's not very precise at all. What if we try to look at the definition of an enum?

Defining an Enum - The Rust Programming Language

...

We attach data to each variant of the enum directly, so there is no need for an extra struct. Here it’s also easier to see another detail of how enums work: the name of each enum variant that we define also becomes a function that constructs an instance of the enum. That is, IpAddr::V4() is a function call that takes a String argument and returns an instance of the IpAddr type. We automatically get this constructor function defined as a result of defining the enum.

Aha! They dropped the words "constructor function" -- so it's pretty much something like a fn(T, ...) -> U or something? So is it some sort of function? Well, as a generally introductory text to Rust, The Rust Programming Language book can be thought as less "technical" and "precise" than The Rust Reference:

Enumerated types - The Rust Reference

An enumerated type is a nominal, heterogeneous disjoint union type, denoted by the name of an enum item. ^1 ...

...

Enum types cannot be denoted structurally as types, but must be denoted by named reference to an enum item.

...

Most of this is pretty standard -- most modern programming languages have "nomimal types" (the type identifier matters for type comparison) -- but the footnote here is the interesting part:

The enum type is analogous to a data constructor declaration in ML, or a pick ADT in Limbo.

This is a good lead! Rust is known for taking a large amount of inspiration from functional programming languages, which are much closer to the mathematical foundations of programming languages.

  • ML is a whole family of functional programming languages (e.g. OCaml, Standard ML, F#, and sometimes Haskell) and is considered one of the important defining language-families within the functional programming language space.
  • Limbo is an older concurrent programming language with support for abstract data types, of which enum is one of.

Both are strongly-rooted in the functional programming language space.

Summary: Rust enum in Functional Programming / Programming Language Theory

For brevity, I'll omit quotes and give a summary of the formal programming language theory behind Rust enum's.

  • Rust enum's are theoretically known as "tagged unions" or "sum types" or "variants".

  • Functional programming and mathematical type theory place a strong emphasis on modeling computation as basically "changes in typed-value structure" versus "changes in data state".

  • So, in object-oriented programming where "everything is an [interactable] object" that then send messages or interact with each other...

  • -- in functional programming, "everything is a pure [non-mutative] value" that is then "transformed" without side effects by "mathematically-pure functions" .

So functional/mathematical type constructors are not intended to "execute" or have any other behavior. They are simply there to "purely construct the structure of pure data."

Conclusion: "Rust doesn't want you to inject a breakpoint into data"

Per Rust's theoretical roots and inspiring influences, Rust enum type constructors are meant to be functional and only to wrap and create type-tagged data.

In other words, Rust doesn't really want to allow you to "inject" arbitrary logic into type constructors (unlike C++, which has a whole slew of semantics regarding side effects in constructors, such as throwing exceptions, etc.).

They want to make injecting a breakpoint into Err(T) sort of like injecting a breakpoint into 1 or as i32. Err(T) is more of a "data primitive" rather than a "transforming function/computation" like if you were to call foo(123).

In Code: why it's probably hard to inject a breakpoint in Err().

Let's start by looking at the definition of Err(T) itself.

The Definition of std::result::Result::Err()

Here's is where you can find the definition of Err() directly from rust-lang/rust/library/core/src/result.rs @ v1.63.0 on GitHub:

// `Result` is a type that represents either success ([`Ok`]) or failure ([`Err`]).
///
/// See the [module documentation](self) for details.
#[derive(Copy, PartialEq, PartialOrd, Eq, Ord, Debug, Hash)]
#[must_use = "this `Result` may be an `Err` variant, which should be handled"]
#[rustc_diagnostic_item = "Result"]
#[stable(feature = "rust1", since = "1.0.0")]
pub enum Result<T, E> {
    /// Contains the success value
    #[lang = "Ok"]
    #[stable(feature = "rust1", since = "1.0.0")]
    Ok(#[stable(feature = "rust1", since = "1.0.0")] T),

    /// Contains the error value
    #[lang = "Err"]
    #[stable(feature = "rust1", since = "1.0.0")]
    Err(#[stable(feature = "rust1", since = "1.0.0")] E),
}

Err() is just a sub-case of the greater enum std::result::Result<T, E> -- and this means that Err() is not a function, but more of like a "data tagging constructor".

Err(T) in assembly is meant to be optimized out completely

Let's use Godbolt to breakdown usage of std::result::Result::<T, E>::Err(E): https://rust.godbolt.org/z/oocqGj5cd

// Type your code here, or load an example.
pub fn swap_err_ok(r: Result<i32, i32>) -> Result<i32, i32> {
    let swapped = match r {
        Ok(i) => Err(i),
        Err(e) => Ok(e), 
    };
    return swapped;
}

example::swap_err_ok:
        sub     rsp, 16
        mov     dword ptr [rsp], edi
        mov     dword ptr [rsp + 4], esi
        mov     eax, dword ptr [rsp]
        test    rax, rax
        je      .LBB0_2
        jmp     .LBB0_5
.LBB0_5:
        jmp     .LBB0_3
        ud2
.LBB0_2:
        mov     eax, dword ptr [rsp + 4]
        mov     dword ptr [rsp + 12], eax
        mov     dword ptr [rsp + 8], 1
        jmp     .LBB0_4
.LBB0_3:
        mov     eax, dword ptr [rsp + 4]
        mov     dword ptr [rsp + 12], eax
        mov     dword ptr [rsp + 8], 0
.LBB0_4:
        mov     eax, dword ptr [rsp + 8]
        mov     edx, dword ptr [rsp + 12]
        add     rsp, 16
        ret

Here is the (unoptimized) assembly code that corresponds to the line Ok(i) => Err(i), that constructs the Err:

        mov     dword ptr [rsp + 12], eax
        mov     dword ptr [rsp + 8], 1

and Err(e) is basically optimized out if you optimized with -C optlevel=3:

example::swap_err_ok:
        mov     edx, esi
        xor     eax, eax
        test    edi, edi
        sete    al
        ret

Unlike in C++, where C++ leaves room to allow for injection of arbitrary logic in constructors and to even to represent actions like locking a mutex, Rust discourages this in the name of optimization.

Rust is designed to discourage inserting computation in type constructor calls -- and, in fact, if there is no computation associated with a constructor, it should have no operational value or action at the machine-instruction level.


Is there any way this is possible?

If you're still here, you really want a way to do this even though it goes against Rust's philosophy.

"...And besides, how hard can it be? If gcc and MSVC can instrument ALL functions with tracing at the compiler-level, can't rustc do the same?..."

I answered a related StackOverflow question like this in the past: How to build a graph of specific function calls?

In general, you have 2 strategies:

  1. Instrument your application with some sort of logging/tracing framework, and then try to replicate some sort of tracing mixin-like functionality to apply global/local tracing depending on which parts of code you apply the mixins.
  2. Recompile your code with some sort of tracing instrumentation feature enabled for your compiler or runtime, and then use the associated tracing compiler/runtime-specific tools/frameworks to transform/sift through the data.

For 1, this will require you to manually insert more code or something like _penter/_pexit for MSVC manually or create some sort of ScopedLogger that would (hopefully!) log async to some external file/stream/process. This is not necessarily a bad thing, as having a separate process control the trace tracking would probably be better in the case where the traced process crashes. Regardless, you'd probably have to refactor your code since C++ does not have great first-class support for metaprogramming to refactor/instrument code at a module/global level. However, this is not an uncommon pattern anyways for larger applications; for example, AWS X-Ray is an example of a commercial tracing service (though, typically, I believe it fits the use case of tracing network calls and RPC calls rather than in-process function calls).

For 2, you can try something like utrace or something compiler-specific: MSVC has various tools like Performance Explorer, LLVM has XRay, GCC has gprof. You essentially compile in a sort of "debug++" mode or there is some special OS/hardware/compiler magic to automatically insert tracing instructions or markers that help the runtime trace your desired code. These tracing-enabled programs/runtimes typically emit to some sort of unique tracing format that must then be read by a unique tracing format reader.

However, because Err(T) is a [data]type constructor and not really a first-class fn, this means that Err(T) will most likely NOT be instrumented like a usual fn call. Usually compilers with some sort of "instrumentation mode" will only inject "instrumentation code" at function-call boundaries, but not at data-creation points generically.

What about replacing std:: with an instrumented version such that I can instrument std::result::Result<T, E> itself? Can't I just link-in something?

Well, Err(T) simply does not represent any logical computation except the creation of a value, and so, there is no fn or function pointer to really replace or switch-out by replacing the standard library. It's not really part of the surface language-level interface of Rust to do something like this.

So now what?

If you really specifically need this, you would want a custom compiler flag or mode to inject custom instrumentation code every-time you construct an Err(T) data type -- and you would have to rebuild every piece of Rust code where you want it.

Some Possible Options

  1. Do a text-replace or macro-replacement to turn every usage of /Err(.*)/ in your application code that you want to instrument into your own macro or fn call (to represent computation in the way Rust wants), and inject your own type of instrumentation (probably using either log or tracing crates).

  2. Find or ask for a custom instrumentation flag on rustc that can generate specific assembly/machine-code to instrument per every usage of Err(T).

CinchBlue
  • 6,046
  • 1
  • 27
  • 58
  • Looking at the unoptimized version, it should be possible to run `disassemble` from `gdb` and `break` on a memory address of the `Err` code path, no? And with `profile` Cargo allows for leaving just the single crate unoptimized, which should be good enough speed-wise. – ArtemGr Sep 22 '22 at 12:36
  • Err does not have a single code path as it is not a concrete function. Even if it was, because it is a generic enum, you would have to make sure to add breakpoints on every instantiated instance of the enum being used to construct an Err. – CinchBlue Sep 22 '22 at 14:37
  • 1
    So we can catch one `Err` assignment in particular, but scaling it up to multiple sites would be troublesome. Got it. BTW, Time Travel Debugging might eventually help? – ArtemGr Sep 22 '22 at 16:31
  • I wonder if you've seen https://github.com/rust-lang/rfcs/pull/2895, might be an example of how the error handling evolves. – ArtemGr Sep 22 '22 at 16:55
  • This seems related to Error, which is a much better place for code injection. However, because this question concerns Err, and Err does not bound it’s inner type to implement the Error trait, this question is trickier. – CinchBlue Sep 22 '22 at 20:24
-3

Yes, it is possible to break execution when an Err() value is created. This can be done by using the debugger to break on the Err() function, and then inspecting the stack trace to find the point at which the Err() value was created.

James
  • 144
  • 6