4

While converting old Turbo Pascal units to modern Object Pascal, I ran into the following:

function Less (var a, b; Relation : POINTER) : boolean;
    inline($5B/$59/$0E/$E8/$00/$00/$58/$05/$08/$00/$50/$51/$53/$CB);

The code is supposed to call an external function {$F+} function VariableLess(var a, b : Index) : boolean; {$F-}, collect the result and pass it to the calling function. The function is used in a unit that provides binary trees for untyped data

procedure InsVarBBTree(var B: BBTree; var E; S: word; A: pointer; var ok: boolean);
{ puts variable E of size S into tree B. The order relation address is A. }

Therefore, the unit itself cannot provide a comparison function, that is the job of the unit that defines the payload.

Using an online disassembler I found out that this corresponds to:

{$ASMMODE intel}
function Less (var a, b; Relation : POINTER) : boolean; assembler;

asm
  pop  bx
  pop  cx
  push cs
  call 6
  pop  ax
  add  ax, 8
  push ax
  push cx
  push bx
  retf
end;

However, the compiler doesn't like the push statement. What should I do to get this to work on a modern 64-bit machine? I realise the code is 16-bit.

  • 1
    The `call 6` is a call instruction going to the next instruction (`e8 00 00`). I do not quite understand how this code works. It seems to mess with the return address. What is the function defined immediately below this one? – fuz Jan 24 '21 at 13:46
  • 1
    Note that this code looks like it abuses certain specific details of how Turbo Pascal compiles. It is unlikely that you can simply translate the instructions to port it to amd64. – fuz Jan 24 '21 at 13:48
  • On further reading, this code pops off the return address, places the address of the function just after `Less` on the stack, and then returns. If the caller of `Less` hadn't set up a stack frame and immediately returns, the return would go to the following function. It is possible that I missed something. If TP sets up a stack frame for the programmer, the two words shuffled around might be something other than the return address. – fuz Jan 24 '21 at 14:02
  • 3
    I have downvoted your question because you have not produced relevant details upon request. I believe without these details, the question cannot be answered. – fuz Jan 24 '21 at 17:01
  • Which modern `object pascal` compiler are you thinking of? The question is difficult to answer without that specification. – LU RD Jan 24 '21 at 17:09
  • You're going to want to re-design this from scratch, without inline asm. If this is some kind of virtual-function hack, hopefully you're using a language that properly supports virtual functions. If you did want to use inline asm, messing around with the CS segment register is definitely not useful and works completely differently from Real Mode. – Peter Cordes Jan 24 '21 at 18:35
  • 1
    @fuz: `call next_insn` / `pop ax` is just getting AX=IP. If I'm counting right, yeah I think that AX+8 return address its computing is just past the `retf`. I think it's using push/push/retf as a far-jmp with current CS and an offset from a pointer arg? Perhaps some kind of adapter between near and far calls? This is all pointless in x86-64 where you don't want to make far calls. x86-64 has RIP-relative LEA so you can get RIP without call/pop if you did want to manufacture a fake return address, mismatched call/ret are bad for performance. – Peter Cordes Jan 24 '21 at 18:42
  • @PeterCordes I mean it looks like it, but the code actually restores the original return address afterwards. So the `retf` instruction indeed returns back to the caller, leaving some extra junk on the stack. – fuz Jan 25 '21 at 10:38

3 Answers3

8

I just compiled some inline function on Turbo Pascal 5 for MS-DOS to check how Turbo Pascal generates code:

For non-inline function calls, Turbo Pascal pushes all function arguments to the stack. The first one is pushed first (so SS:SP points to the last function argument). Then a (far) call is executed. The function returns using retf n, which means that the function called removes all parameters from the stack.

In an inline function, the raw bytes given simply replace the call instruction. This means that SS:SP points to the arguments, not to the return address. The inline machine language code must pop the arguments from the stack. And it must not return using ret but simply continue code execution at the instruction after the inline code.

With this knowledge the assembly code can be analyzed:

Using the assembly code given, you can call any function or procedure with any parameters (in your case: VariableLess) indirectly by writing a helper function (in your case: Less) that has the same arguments as the function to be called plus an additional argument that points to the actual function.

The code is equal to the following Delphi or FreePascal code:

type
    TMyCompare = function(var a, b) : boolean;

function Less (var a, b; Relation : TMyCompare) : boolean;
begin
    Less := Relation(a, b);
end;

If your compiler supports function pointers (type TMyCompare = function ...), you could do it like this.

Or you could even replace all occurrences of Less(x,y,z) in your program by z(x,y). This would even be more efficient.

Of course, the pointer to the function (VariableLess) should not have the type pointer but the type TMyCompare if you do it like this.

If your compiler does not support function pointers (as Turbo Pascal obviously did not), you might need assembly.

But in that case, different compilers will need different assembly code!

So not knowing internals of your compiler, it is not possible to translate the assembly code.

EDIT

I'm not sure how exactly your compiler works. However, maybe the following code works if my original code does not work:

function Less (var a, b; Relation : Pointer) : boolean;
type
    TMyCompare = function(var a, b) : boolean;
var
    Relation2 : TMyCompare;
begin
    Relation2 := TMyCompare(Relation);
    Less := Relation2(a, b);
end;
Martin Rosenau
  • 17,897
  • 3
  • 19
  • 38
  • 1
    That explains it! Thanks a lot for researching this peculiar feature. – fuz Jan 25 '21 at 10:43
  • 1
    TP 5 afaik has procedural types (function pointers), but it can only inline this way. So if your call is relatively expensive, this might be a speed optimization for early x86. However Delphi 2006+ and FPC 2.2+ can inline functions crossunit, so there is no need to do this by hand. – Marco van de Voort Jan 25 '21 at 14:13
  • This seems close to the solution, and much more elegant than assembler code that needs to be changed for every platform (just think of the new Apple CPU!). But if the type of the function call would be TMyCompare rather than Pointer, how would I call it in InsVarBBTree, still with the address operator @VariableLess? – Engelbert Buxbaum Jan 26 '21 at 13:56
  • I tried it out, but got an error: TestDynam.lpr(110,53) Error: Incompatible type for arg no. 4: Got "
    ", expected ";var ):Boolean;Register>".
    – Engelbert Buxbaum Jan 26 '21 at 14:47
  • 1
    @EngelbertBuxbaum See my EDIT section. Maybe this will work... – Martin Rosenau Jan 26 '21 at 14:56
  • Almost! The type conversion needs to be in the call to InsVarBBTree, see the solution I posted below. Many thanks – Engelbert Buxbaum Jan 27 '21 at 16:00
2

The solution is as follows: Inside the unit handling dynamic types define

 type
      TMyCompare = function(var a, b) : boolean;  
    
    function Less (var a, b; Relation : TMyCompare) : boolean;
    
    begin
      Result := Relation(a, b);
    end;

    procedure InsVarBBTree(var B: BBTree; var E; S: word; A: TMyCompare; 
          var ok: boolean);
    { puts variable E of size S into tree B. The order relation address is A. }

This is called from outside

{$F+} function VariableLess(var a, b : Index) : boolean; {$F-}
begin
...
end;

InsVarBBTree(Baum, TempStr, SizeOf(TempStr), TMyCompare(@VariableLess), OK)

Thanks to all who helped with that

Engelbert

1

Afaik it will work for Free Pascal 16-bits targets, but you are probably not interested in that. For non 16-bit targets, 16-bit code will need rewrite, the memory model and ABI is different. Your disassembly is symbolic and the constants don't really reflect the meaning of the source like an original assembler source (.asm) would.

This procedure is particularly hairy since it seems to make assumptions on the ABI, so it is inherently unportable.

Moreover, even if you would succeed, then the result would be suboptimal, since the object pascal compilers (Delphi, Free Pascal) are way more optimizing than TP ever was. Using external assembler for short procedures stunts the ability of the compiler to inline.

I think Peter Cordes is right, and this is a kind of thunk to the real functionality after the shown procedure. Nope, I think Martin is closer. It is a bit unlogical but because the compiler can't really inline (only dump an assembler block) the parameter calling remains the same, and must be undone without access to a stack frame/local variables. TP doesn't keep values in registers, so this is relatively safe.

You could try to disassemble that too, but the best is probably to simply try to formulate an pascal substitute for it from documentation, and forget about all the micro-optimizations.

Marco van de Voort
  • 25,628
  • 5
  • 56
  • 89