Nicer way to pattern match window of assembly instructions for peephole w/ Rust?

Question

So I am trying to implement peephole optimisation where I go from a Vec<LLStackInstruction> -> Vec<LLStackInstruction>, where the returned list is optimised. (LL for low level) Our compiler is modeled as an abstract stack machine and for an init/assign we push RHS onto stack, push addr of LHS on stack and then pop and STR but trying to reduce this into a push RHS and then store into address straight through an address specififer but I get a very ugly matching like so: (and with mutability)

pub fn peephole_optimiser(instrs: Vec<LLStackInstruction>) -> Vec<LLStackInstruction> {
    let mut optimized_instructions: Vec<LLStackInstruction> = Vec::new();

    let mut iter = instrs.iter().peekable();

    while let Some(instruction) = iter.next() {
        let window_iter = iter.clone();
        let window: Vec<_> = [vec![instruction], window_iter.take(10).collect()].concat();
        
        if let Some(LLStackInstruction::Push(Register::FP)) = window.get(0) {
            if let Some(LLStackInstruction::Mov(Condition::None, r1, ArgType::Imm(n))) = window.get(1) {
                if let Some(LLStackInstruction::Push(r2)) = window.get(2) {
                    if let Some(LLStackInstruction::Pop(_)) = window.get(3) {
                        if let Some(LLStackInstruction::Pop(_)) = window.get(4) {
                            if let Some(LLStackInstruction::Sub(..)) = window.get(5) {
                                if let Some(LLStackInstruction::Branch(..)) = window.get(6) {
                                    if let Some(LLStackInstruction::Push(_)) = window.get(7) {
                                        if let Some(LLStackInstruction::Pop(_)) = window.get(8) {
                                            if let Some(LLStackInstruction::Pop(_)) = window.get(9) {
                                                if let Some(LLStackInstruction::Str(..)) = window.get(10) {
                                                    if r1 == r2 {
                                                        optimized_instructions.push(LLStackInstruction::Pop(Register::R4));
                                                        optimized_instructions.push(LLStackInstruction::Str(Register::R4, AddressSpecifier::ImmediateOffset(Register::FP, -n)));
                                                        iter.nth(9);
                                                        continue;
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }

        optimized_instructions.push(instruction.clone());
    }
    optimized_instructions
}

How could I go about making this nicer

Example of what this optimisation does

            // Compact init/assign statements
            // push {fp}
            // mov r1, #88
            // push {r1}
            // pop {r0}
            // pop {r1}
            // subs r0, r1, r0
            // bvs _overflow_err
            // push {r0}
            // pop {r3}
            // pop {r4}
            // str r4, [r3]

            // INTO
            // pop {r4}
            // str r4, [fp, #-88]

Only thing I could think of was if I could express this into multiple steps of smaller windows and do multiple optimisation passes (or after optimising a window, rerun from its start on the result until its unchanged)

Please provide a minimum reproducible example next time, such as [a link to the Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=ad34dfb8fc1f01fc1bd8cf5ad7a8fe6c) with type definitions. It makes it easier to answer your question. — Matthieu M., Mar 13 '23 at 16:19
Stylistic note: don't take a `Vec` if you don't modify it internally, just take a slice (`&[]`) instead. — Matthieu M., Mar 13 '23 at 16:20
Sorry will do next time! And yeah you are right on the second point, clippy warned me on that one just hadn;t changed it yet — Ahmed Mahmud, Mar 13 '23 at 19:31

score 5 · Answer 1 · answered Mar 13 '23 at 15:59

You could match your window directly instead of nested if let statements. Rust supports pattern destructuring of slices:

match window[..11] {
    [
        LLStackInstruction::Push(Register::FP), 
        LLStackInstruction::Push(r2), 
        LLStackInstruction::Pop(_), 
        LLStackInstruction::Pop(_), 
        LLStackInstruction::Sub(..), 
        LLStackInstruction::Branch(..), 
        LLStackInstruction::Push(_), 
        LLStackInstruction::Pop(_), 
        LLStackInstruction::Pop(_), 
        LLStackInstruction::Str(..)
    ] if r1 == r2 => {
        optimized_instructions.push(LLStackInstruction::Pop(Register::R4));
        optimized_instructions.push(LLStackInstruction::Str(
            Register::R4,
            AddressSpecifier::ImmediateOffset(Register::FP, -n),
        ));
        iter.nth(9);
        continue;
    }
    _ => {}
}

Matthieu M. · Accepted Answer · 2023-03-14T07:59:34.350

As mentioned by Jonas, you can use slice pattern matching for greater ergonomics, and on nightly you can combine it with let-chains to avoid a nested-if.

The difficulty then, is how to combine windowing with skipping an arbitrary number of instructions on success.

The bare-bones solution is to use manual indexing:

pub fn peephole_optimiser(instrs: &[LLStackInstruction]) -> Vec<LLStackInstruction> {
    const WINDOW_SIZE: usize = 11;

    type Instr = LLStackInstruction;

    if instrs.len() < WINDOW_SIZE{
        optimized.extend_from_slice(instrs);
        return optimized;
    }

    let mut optimized = Vec::new();
    let mut index = 0;

    while index < instrs.len() - WINDOW_SIZE {
        if let [
            Instr::Push(Register::FP),
            Instr::Mov(Condition::None, r1, ArgType::Imm(_)),
            Instr::Push(r2),
            Instr::Pop(_),
            Instr::Pop(_),
            Instr::Sub(..),
            Instr::Branch(..),
            Instr::Push(..),
            Instr::Pop(..),
            Instr::Pop(..),
            Instr::Str(..)
        ] = &instrs[index..][..WINDOW_SIZE]
        {
            if r1 == r2 {
                optimized.push(LLStackInstruction::Pop(Register::R4));
                optimized.push(LLStackInstruction::Str(<>));

                index += WINDOW_SIZE;
                continue;
            }
        }
        
        optimized.push(window[0].clone());
        index += 1;
    }

    optimized
}

but it's a bit harder to understand the purpose, and you run the risk of forgetting to increase the index before running the loop again... which would lead to an infinite loop.

slice has a windows functions to iterate over windows, in a for loop though it requires a manual skip:

pub fn peephole_optimiser(instrs: &[LLStackInstruction]) -> Vec<LLStackInstruction> {
    const WINDOW_SIZE: usize = 11;

    type Instr = LLStackInstruction;

    let mut optimized = Vec::new();

    if instrs.len() < WINDOW_SIZE {
        optimized.extend_from_slice(instrs);
        return optimized;
    }

    let mut skip = 0;

    for window in instrs.windows(WINDOW_SIZE) {
        if skip > 0 {
            skip -= 1;
            continue;
        }

        if let [
            Instr::Push(Register::FP),
            Instr::Mov(Condition::None, r1, ArgType::Imm(_)),
            Instr::Push(r2),
            Instr::Pop(_),
            Instr::Pop(_),
            Instr::Sub(..),
            Instr::Branch(..),
            Instr::Push(..),
            Instr::Pop(..),
            Instr::Pop(..),
            Instr::Str(..)
        ] = window
        {
            if r1 == r2 {
                optimized.push(Instr::Pop(Register::R4));
                optimized.push(Instr::Str(<>));
                
                skip = 9;
                continue;
            }
        }
        
        optimized.push(window[0].clone());
    }

    optimized
}

The extra variable is not great, though.

In the end, maybe a combination of slice::windows with a somewhat more manual iteration mode is the best we can achieve:

Single iteration variable, rather than 2 disparate ones.
With no way to accidentally run into an infinite loop.

Especially with extracting the matching into a separate function, I think it's pretty readable:

pub fn peephole_optimiser(instrs: &[LLStackInstruction]) -> Vec<LLStackInstruction> {
    const WINDOW_SIZE: usize = 11;

    type Instr = LLStackInstruction;

    fn matches(window: &[Instr]) -> bool {
        assert_eq!(WINDOW_SIZE, window.len());

        if let [
            Instr::Push(Register::FP),
            Instr::Mov(Condition::None, r1, ArgType::Imm(_)),
            Instr::Push(r2),
            Instr::Pop(_),
            Instr::Pop(_),
            Instr::Sub(..),
            Instr::Branch(..),
            Instr::Push(..),
            Instr::Pop(..),
            Instr::Str(..)
        ] = window
        {
            r1 == r2
        } else {
            false
        }
    }

    let mut optimized = Vec::new();

    let mut windows = instrs.windows(WINDOW_SIZE);

    while let Some(window) = windows.next() {
        if matches(window) {
            optimized.push(Instr::Pop(Register::R4));
            optimized.push(Instr::Str(<>);
            
            windows.nth(WINDOW_SIZE - 2);
            continue;
        }
        
        optimized.push(window[0].clone());
    }

    optimized
}

"The difficulty then, is how to combine windowing with skipping an arbitrary number of instructions on success." But isn't this already happening when using iterators and doing `.nth()` on success or am I misunderstanding — Ahmed Mahmud, Mar 13 '23 at 19:29
@AhmedMahmud: It is happening in your solution, and the 2 I presented, yes. I do not find either particularly neat, though. Your solution allocates superfluous vectors, which is costly performance wise -- you could fix the allocations by collecting an array of 11 references to instructions, but there's no neat way to do that, either. My last solution uses an extra `skip` to achieve the jump, doubling conditions, and my first solution completely eschews higher-level helpers to use indexing, running the risk of not incrementing.... maybe a combination of `windows` + `nth` is actually the best... — Matthieu M., Mar 13 '23 at 20:16
@AhmedMahmud: I added a 3rd solution, blending `slice::windows` with `while` + `nth`, which is the best I can think of right now. I also extracting the `match` function, so that the matching condition is clearer, and I think the result is fairly readable. I would however advise you to double that `nth` call: I'm not convinced `9` is the correct offset, looks to me like it should be `WINDOW_SIZE - 1`. 1 instruction is skipped in `next`, and with 9 that would mean the next instruction is a `Str` (since it matched), and there's no point in having the next loop iteration trying to match `Str`. — Matthieu M., Mar 13 '23 at 20:29
It should be 9 because the `nth` starts from after the current (0 will skip 1) so 9 skips 10 of the next windows and yeah we shouldn't check removed instrs at all or it'd be duplicating code and break. This does face issue on that if the last window doesnt match, it discards the [1..] instructions but Ill work on that, thanks for the pointers! — Ahmed Mahmud, Mar 13 '23 at 22:33

score 1 · Answer 3 · answered Mar 13 '23 at 15:54

1

On nightly, there is the if-let chains feature:

#![feature(let_chains)]

if let a = b && let c = d && e == f { ... }

On stable, you can use the if_chain crate:

if_chain::if_chain! {
    if let a = b;
    if let c = d;
    if e == f;
    then { ... }
}

answered Mar 13 '23 at 15:54

Chayim Friedman

47,971
5
48
77

Nicer way to pattern match window of assembly instructions for peephole w/ Rust?

3 Answers3