I attempted to write a sort algorithm to reorder instructions for a dual issue processor (Cell SPU). One way to obtain dual issue processing an instruction should not depend on the instruction that precedes it (another involves separate pipelines, but I'm focused on instructions in the same pipeline). I understand this would be too much for the compiler to do and I didn't find what I needed when searching. This can be done by hand in most cases but a sorting algorithm should ensure the lowest "sequence count" (number or dependent instructions that follow each-other).
My question is has this or something similar been done before? Is there an optimized approach?
Simple example pseudo-code halving instruction time (inputs: i1, i2, i3
):
v1 = i1 ^ i2; - #single-issued
v2 = v1 | i2; \ #v2,v3 dual-issued
v3 = i1 & i3; / #v2,v3 dual-issued
v4 = v3 & i2; - #single-issued
can be written as:
v1 = i1 ^ i2; \ #v1,v3 dual-issued
v3 = i1 & i3; / #v1,v3 dual-issued
v2 = v1 | i2; \ #v2,v4 dual-issued
v4 = v3 & i2; / #v2,v4 dual-issued
Here is a python implementation I created that recursively reorders the instructions to achieve the lowest "sequence count".
reorder.py
http://pastebin.com/dt8eWy3H
sample t8-1.h
http://pastebin.com/w0DYg8ff