So I am trying to implement peephole optimisation where I go from a Vec<LLStackInstruction> -> Vec<LLStackInstruction>
, where the returned list is optimised. (LL
for low level)
Our compiler is modeled as an abstract stack machine and for an init/assign we push RHS onto stack, push addr of LHS on stack and then pop and STR but trying to reduce this into a push RHS and then store into address straight through an address specififer but I get a very ugly matching like so: (and with mutability)
pub fn peephole_optimiser(instrs: Vec<LLStackInstruction>) -> Vec<LLStackInstruction> {
let mut optimized_instructions: Vec<LLStackInstruction> = Vec::new();
let mut iter = instrs.iter().peekable();
while let Some(instruction) = iter.next() {
let window_iter = iter.clone();
let window: Vec<_> = [vec![instruction], window_iter.take(10).collect()].concat();
if let Some(LLStackInstruction::Push(Register::FP)) = window.get(0) {
if let Some(LLStackInstruction::Mov(Condition::None, r1, ArgType::Imm(n))) = window.get(1) {
if let Some(LLStackInstruction::Push(r2)) = window.get(2) {
if let Some(LLStackInstruction::Pop(_)) = window.get(3) {
if let Some(LLStackInstruction::Pop(_)) = window.get(4) {
if let Some(LLStackInstruction::Sub(..)) = window.get(5) {
if let Some(LLStackInstruction::Branch(..)) = window.get(6) {
if let Some(LLStackInstruction::Push(_)) = window.get(7) {
if let Some(LLStackInstruction::Pop(_)) = window.get(8) {
if let Some(LLStackInstruction::Pop(_)) = window.get(9) {
if let Some(LLStackInstruction::Str(..)) = window.get(10) {
if r1 == r2 {
optimized_instructions.push(LLStackInstruction::Pop(Register::R4));
optimized_instructions.push(LLStackInstruction::Str(Register::R4, AddressSpecifier::ImmediateOffset(Register::FP, -n)));
iter.nth(9);
continue;
}
}
}
}
}
}
}
}
}
}
}
}
optimized_instructions.push(instruction.clone());
}
optimized_instructions
}
How could I go about making this nicer
Example of what this optimisation does
// Compact init/assign statements
// push {fp}
// mov r1, #88
// push {r1}
// pop {r0}
// pop {r1}
// subs r0, r1, r0
// bvs _overflow_err
// push {r0}
// pop {r3}
// pop {r4}
// str r4, [r3]
// INTO
// pop {r4}
// str r4, [fp, #-88]
Only thing I could think of was if I could express this into multiple steps of smaller windows and do multiple optimisation passes (or after optimising a window, rerun from its start on the result until its unchanged)