0

I'm using the eicar.com file and playing around with reverse engineering tools. I'd like to be able to disassemble and reassemble this file. I get close but there are still a few problems that I cannot figure out.

This is the original eicar.com ascii file.

X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

Using udcli udcli -noff -nohex eicar.com > stage1.asm I end up with this x86 assembly

pop eax                 
xor eax, 0x2550214f     
inc eax                 
inc ecx                 
push eax                
pop ebx                 
xor al, 0x5c            
push eax                
pop edx                 
pop eax                 
xor eax, 0x5e502834     
sub [edi], esi          
inc ebx                 
inc ebx                 
sub [edi], esi          
jge 0x40                
inc ebp                 
dec ecx                 
inc ebx                 
inc ecx                 
push edx                
sub eax, 0x4e415453     
inc esp                 
inc ecx                 
push edx                
inc esp                 
sub eax, 0x49544e41     
push esi                
dec ecx                 
push edx                
push ebp                
push ebx                
sub eax, 0x54534554     
sub eax, 0x454c4946     
and [eax+ecx*2], esp    
sub ecx, [eax+0x2a]

Finally, putting it back together with nasm using this command, nasm stage1.asm -o stage2 I end up with...

fXf5O!P%f@fAfPf[4\fPfZfXf54(P^fg)7fCfCfg)7^O<8d>^R^@fEfIfCfAfRf-  STANfDfAfRfDf-ANTIfVfIfRfUfSf-TESTf-FILEfg!$Hfg+H*

In this case I'm starting with an ASCII file and end up with a bin file that holds a lot of extra garbage.

What am I missing here? How do I end up with the original ASCII string and have the proper file type?

EDIT: Per @Ross Ridge's suggestion, he noted that I was disassembling a 16-bit file as a 32-bit one, this has successfully cleaned up the string but he file type however is still incorrectly output as binary.

First fix: udcli -16 -noff -nohex eicar.com > stage1.asm to obtain proper output string.

Results in X5O!P%@AP[4\PZX54(P^)7CC)7^O<8d>"^@EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

Still a little garbage data not present in the original but very close.

Scot Matson
  • 745
  • 6
  • 23
  • 2
    You're disassembling it as 32-bit code and assembling it as 16-bit code. Given the extension you should probably be disassembling it as 16-bit code. – Ross Ridge Dec 01 '16 at 22:36
  • @Ross Ridge your right!!! that cleaned up the string. I'll edit the original question, but the output file type is still incorrect. I see that I can explicitly change this with `nasm` but I do not see an ASCII option. Is there another tool or step I should be taking into consideration? – Scot Matson Dec 01 '16 at 22:38
  • 1
    It looks like you're using the correct output file type with NASM, the binary file type. You can specify it explicitly with `-f bin`. ASCII files are binary files that only contain ASCII characters, and your source binary, `eicar.com`, happens to only contain ASCII characters. – Ross Ridge Dec 01 '16 at 22:43
  • My guess is possibly that one of the characters is still not converting properly and so it is not being properly represented as an ASCII file. This character here, `^O<8d>` which is supposed to be the `}` is the likely cause then. – Scot Matson Dec 01 '16 at 22:49
  • 1
    The difference between the input and output are probably explained by the fact there are often different ways to assemble a given instruction. The input code may have been hand assembled so the machine code consists entirely of ASCII characters, while NASM uses the most natural and shortest encoding for instructions. – Ross Ridge Dec 01 '16 at 22:49
  • Jean-Francois Fabre pointed this possible problem out as well. Using the -O[0-2] flags unfortunately didn't result in cleaning up that pesky '}' character. – Scot Matson Dec 01 '16 at 22:50
  • 2
    There may be no way of getting NASM, or any other assembler to assemble the output of `udcli` into the original file without editing it. As I suggested earlier, the file may have been originally hand assembled without the use of an assembler. If you want something that NASM can assemble into the original you could do `od -v -A n -t x1 eicar.com | sed -e 's/ /, 0x/g' -e 's/^,/DB/'` but this isn't going to help you understand the code. For that you should just look at the disassembly, ideally with the machine code bytes on the left. – Ross Ridge Dec 01 '16 at 23:17
  • Possible duplicate of [Disassembling, modifying and then reassembling a Linux executable](http://stackoverflow.com/questions/4309771/disassembling-modifying-and-then-reassembling-a-linux-executable) – Scot Matson Dec 02 '16 at 20:51

2 Answers2

3

In general you can't reassemble the output of a dissembler back into the exact the same binary file as the original. There is often more than one way to assemble a given assembly instruction into machine code. As far your ultimate goal of understanding the code you're trying to do this with it's also not very helpful. Even if you do get something that you can assemble back into the original code, it's extremely unlikely you'll get something you can modify and assemble into code that works.

To illustrate this I've provided my own "disassembly" of the eicar.com file, one that allows it to be modified to a limited extent. You can modify the string it prints, so long as the message isn't too long and does't contain any dollar sign $ characters. You should be able to modify the string while still keeping the output consisting of only of printable ASCII characters, assuming you only put printable ASCII characters in the string.

    BITS    16
    ORG     0x100

ascii_shift EQU 0x097b

start:
    pop     ax
    xor     ax, 0x2000 | (skip - start + 0x100) | 0x000f
    push    ax
    and     ax, 0x4000 | (skip - start + 0x100)
    push    ax
    pop     bx
    xor     al, (msg - start) ^ (skip - start)
    push    ax
    pop     dx
    pop     ax
    xor     ax, (0x2000 | (skip - start + 0x100) | 0x000f) ^ ascii_shift
    push    ax
    pop     si
    sub     [bx], si
    inc     bx
    inc     bx
    sub     [bx], si
    jnl     skip

msg:
    DB      'EICAR-STANDARD-ANTIVIRUS-TEST-FILE!'
    DB      '$'

%if ($ - msg) < 0x21
    TIMES   0x21 - ($ - msg) DB '$'
%endif

skip:
    DW      0x21cd + ascii_shift
    DW      0x20cd + ascii_shift

%if skip - msg > 0x7e
%error  'msg too long'
%endif

I won't explain how the code works, but I'll give you one hint: MS-DOS pushes a 16-bit 0 value on the stack at the start execution of a .COM format executable.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
1

The problem is that the disassembler makes no difference between the code and the data.

Notice this:

sub eax, 0x54534554     ; 'TEST'
sub eax, 0x454c4946     ; 'FILE'

(and all the sub eax statements)

this is not really code (it makes no sense substracting both values without using them in-between), this is a part of the message (there's TEST in the first instruction, then FILE)

So when you're reassembling it, optimizations can occur which break your data (sub could be interpreted in different ways). You have to identify the data sections so they're not treated as code by your assembler.

Another way to go is to turn off all assembling optimizations.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • "turn off all assembling optimizations." I wouldn't describe it like that (I don't think the effort of assembler is worth the word "optimization", it's more like just finding leanest opcode which still fits source)... It's more like specifying particular instruction to be assembled in certain particular way. And that may prove to be quite difficult. For example I have hard time to imagine how you would enforce `nasm` to produce `mov al,[ds:bx]` including the `ds` prefix opcode (except the obvious `db 0x3E` ahead of `mov al,[bx]` in source). But the disassembler will merge it I suppose. – Ped7g Dec 02 '16 at 15:08
  • yes, no global optimizations, just instruction optimizations: choosing a shorter operand or doing something equivalent to save instruction cycle (like sub eax,eax to zero eax) – Jean-François Fabre Dec 02 '16 at 15:26