What are the best practicises to interpret an assembly program without knowing what it's supposed to do?

Question

I'm studying to pass an assembly exam where we are supposed to interpret an assembly source file provided without comments or anything, which is a task i find particularly difficult. My question is if in these cases, which i assume happen quite often in a professional environment, there are clues to look for to determine what the flow and the purpose of the program is. I am able to recognize loops through branching and jump instructions, but not much else apart from those, and i can't seem to find any proper source online. I'll post here an example program for reference, which is the current one i'm having a hard time with, in case anyone would want to point out something.

.data
  data_:  .byte  20, -40, -80
  pow_:  .alloc  804
  d:  .alloc  8

.global main

main:
 sub:  
   MOV  r0, #-100
   MOV  r2, #-1
   EOR  r1, r2, r0
   ADD  r1, r1, #1
   LDR  r2, =data_
   ADR  r9, d
   ADR  r8, pow_
   LDRB  r3, [r2]
   LDRB  r4, [r2, #1]
   LDRB  r5, [r2, #2]
 mpt:  
   MUL  r6, r0, r0
   MUL  r6, r3, r6
   MUL  r7, r0, r4
   ADD  r6, r5, r6
   ADD  r6, r7, r6
   STR  r6, [r8]
   CMP  r0, #-100
   BGT  aft 
 r:  
   STR  r6, [r9, #4]
   STR  r0, [r9]
 cyc:
   ADD  r8, r8, #4
   ADD  r0, r0, #1
   CMP  r0, r1
   BLE  mpt 
   MOV  r15, r14
 aft: 
   LDR  r2, [r9, #4]
   CMP  r6, r2
   BLT  r
   B  cyc

This code is ARM7, the .alloc instruction isn't a real instruction but it's there just to signify an allocation of n bits under the alias provided. From my understanding the program is trying to perform a loop of some sort while keeping an iteration counter, but i can't get my head around even why there is an EOR there, which converts to #-100 and #-1 to a #99

Any kind of suggestion is welcome

The only possible way to figure it out, if at all, is to run it through the debugger. Even so, it's highly unlikely that you understand, especially if it's translated from a high level language. — Michael Chourdakis, Jun 04 '22 at 16:30
One good approach is to try and translate the program line by line into high-level code. Then see if you can recognise any of the things it does. — fuz, Jun 04 '22 at 17:54
"i can't get my head around even why there is an EOR there, which converts to #-100 and #-1 to a #99". The code appears to be written to be intentionally obtuse. Nobody in real life would use EOR that way. They would just `MOV r1, #99`. — Raymond Chen, Jun 05 '22 at 01:28
@MichaelChourdakis: That's over-stating things a bit. If you know assembly well, it's not that hard to read simple loops, basically simulating it in your head. Having *some* idea of what it does helps, e.g. if you're looking for how a compiler auto-vectorized some C that you have source for. But even if not, people do reverse-engineer parts of real binaries sometimes, especially malware. (Typically small, and no other viable way to see what it does.) — Peter Cordes, Jun 05 '22 at 01:30
@MichaelChourdakis: Or reversing un-obfuscated compiler-generated asm is a fun homework assignment for students learning assembly, often in the form of a "bomb lab" like [Bomb Lab phase 5: 6 char string substitution lookup table, strings\_not\_equal](https://stackoverflow.com/q/22032475) (example slides introducing the concept to students: https://www.cs.cmu.edu/afs/cs/academic/class/15213-s20/www/recitations/recitation04-bomblab.pdf or an assignment description http://www.csc.villanova.edu/~mdamian/Past/csc8400fa15/labs/bomblab.pdf). Typically x86 so students can try on their own machines. — Peter Cordes, Jun 05 '22 at 01:30

xiver77 · Accepted Answer · 2022-06-04T19:34:04.117

You can translate each assembly instruction to some high level form, and reduce them to a simpler form. Have a look at the following example translating your mpt, and you'll get the basic idea.

 mpt:  
   MUL  r6, r0, r0
   MUL  r6, r3, r6
   MUL  r7, r0, r4
   ADD  r6, r5, r6
   ADD  r6, r7, r6
   STR  r6, [r8]
   CMP  r0, #-100
   BGT  aft

First, translate line by line.

r6 = r0 * r0
r6 = r6 * r3
r7 = r0 * r4
r6 = r6 + r5
r6 = r6 + r7
*r8 = r6
if (r0 > -100) goto aft

Then, make it simpler.

r7 = r0 * r4
r6 = r0 * r0 * r3 + r5 + r7
*r8 = r6
if (r0 > -100) goto aft

You'll get some readable code by applying the same procedure to the other parts of your code.

Your simpler pseudocode seems to get the assignment to `r7` wrong. It should happen first. — ecm, Jun 04 '22 at 19:29

What are the best practicises to interpret an assembly program without knowing what it's supposed to do?

1 Answers1