71

I'd like to make a simple x86 assembler. I'm wondering if there's any tutorials for making your own assembler. Or if there's a simple assembler that I could study.

Also, I wonder what tools are used in looking at and handling the binary/hex of programs.

mudgen
  • 7,213
  • 11
  • 46
  • 46
  • 2
    I would prefer developing the assembler in C. – mudgen Mar 19 '10 at 14:48
  • 1
    Some of the references listed in http://stackoverflow.com/questions/1669/learning-to-write-a-compiler probably address this as a phase of compilation (and some probably do not). [[Related, *not* a duplicate.]] – dmckee --- ex-moderator kitten Mar 19 '10 at 15:58
  • 3
    Also take a look @ "How to write a disassembler?" http://stackoverflow.com/questions/924303/how-to-write-a-disassembler – claws Apr 20 '10 at 05:03
  • 7
    This is valid and not _too broad_; the OP's just looking for tutorials or resources about it. Is writing an assembler _too broad_ for _Stack Overflow_? _High-level_ guys.. – Константин Ван Aug 22 '17 at 00:51
  • 1
    I'm writing my own 6502 assembly compiler and this is how I've done it: use regex strings to identify mnemonics (i.e. commands), any special bracket & special notation with regex back trace (for different addressing modes), and regex hex, binary and decimal recognition. Then set up mnemonic recognition routines to identify the variations of each mnemonic and just send the appropriate bytes to an output file. You'll just need a list of mnemonics, their addressing modes, syntax, and opcodes. –  Nov 07 '20 at 06:21

4 Answers4

64

This is what you are looking for:

Assemblers And Loaders - By David Salomon. Published February, 1993 - Freely available (download here)

Of course, you are going to need the following:

  1. Intel® 64 and IA-32 Architectures Software Developer's Manuals
  2. AMD-64 Architecture Programmers manual
  3. Linkers and Loaders by John R. Levine (freely available)
  4. ELF File Format Specifications : System V ABI Update
  5. Microsoft Portable Executable and Common Object File Format Specification

You can always refer to implementations of Opensource Assemblers:

  1. Netwide Assembler (NASM)
  2. Gnu Assembler (GAS)
kbmz
  • 83
  • 3
claws
  • 52,236
  • 58
  • 146
  • 195
  • 4
    I'd like to add a bit to this answer but in a form of a comment: check out this link: http://www.plantation-productions.com/Webster/RollYourOwn/index.html – Francis Cugler Apr 24 '18 at 03:44
5

Just a very tiny piece of code in Delphi 7.

{$APPTYPE CONSOLE}
program assembler;
uses sysutils;
const
s1=#0#77#1#90#59#64#4#80#1#69#3#76#1#1#1#1#14#224#2#15#1#1#1#11#1#1#1#1#1#64#13+
#116#1#16#13#64#3#16#4#2#3#1#8#3#2#10#7#32#4#2#7#3#5#16#4#16#5#1#10#16#13#16#3+
#184#124#184#5#16#3#184#5#2#15#96#3#224#173#52#1#16#3#40#1#16#23#65#1#16#3#80#1+
#16#7#75#1#69#1#82#1#78#1#69#1#76#1#51#1#50#1#46#1#68#1#76#1#76#4#71#1#101#1+
#116#1#83#1#116#1#100#1#72#1#97#1#110#1#100#1#108#1#101#4#87#1#114#1#105#1#116+
#1#101#1#67#1#111#1#110#1#115#1#111#1#108#1#101#1#65#2#72#1#101#1#108#1#108#1+
#111#1#44#1#32#1#87#1#111#1#114#1#108#1#100#1#33#1#13#1#10#5#0;
s3=#1#185#1#7#4#136#1#195#1#128#1#227#1#15#1#193#1#216#1#4#1#128#1#251#1#9+
#1#118#1#3#1#128#1#195#1#39#1#128#1#195#1#48#1#136#1#153#1#96#1#16#1#64#2#73#1+
#125#1#228#1#106#2#104#1#112#1#16#1#64#2#106#1#8#1#104#1#96#1#16#1#64#2#106#1+
#245#1#255#1#21#1#40#1#16#1#64#2#80#1#255#1#21#1#44#1#16#1#64#2#195;
var
  f:file of byte;p,i:integer;o:string;
  t:text;line:string;
procedure w(s: string);
begin
  i:=1;
  while i<length(s) do begin
    inc(p,ord(s[i]));
    setlength(o, p);
    o[p]:=s[i+1];
    inc(i,2);
  end;
end;
procedure al(b: byte);
var
  a: longword;pc: pchar;
begin
  a := strtoint(line); pc:=@a;
  o := o + chr(b) + pc^ + (pc+1)^ + (pc+2)^ + (pc+3)^; inc(p,5); // mov eax, imm32
end;
begin
  assign(f,'out.exe');
  rewrite(f);
  p:=1;
  w(s1);
  assignfile(t, ''); reset(t);
  while not eof(t) do begin
    readln(t, line); line := trim(line);
    if copy(line,1,8) = 'mov eax,' then begin
      system.delete(line,1,8);
      al($b8); // mov eax, imm32
    end
    else if copy(line,1,8) = 'add eax,' then begin
      system.delete(line,1,8);
      al($05); // add eax, imm32
    end
    else if copy(line,1,8) = 'and eax,' then begin
      system.delete(line,1,8);
      al($25); // and eax, imm32
    end
  end;
  closefile(t);
  w(s3);
  blockwrite(f,o[1],p); close(f);
end.

The assembler understands only three different assembler codes "mov eax,immed32", "add eax,immed32", "and eax,immed32" and no data nor labels. It will produce a tiny Windows PE executable which outputs eax in hex at the end.

Attention: In my case avira free antivirus doesn't like the output. It's a false positive. I had to switch off the real time protection. Check the result with a debugger if you are uncertain if this is malware (It's not!)

NilsB
  • 1,154
  • 1
  • 16
  • 42
  • 8
    Why Delphi and not C or C++? I do not think there is no software builder using Delphi anymore. Can you translate this code C or C++? – Ibrahim Ipek Aug 08 '16 at 10:22
5

I wrote one a long time ago. It is as simple as getting the x86 assembler ref guide from Intel, and writing the bytes to a .com file (for windows). I wish i could find my old forum post I made on it. It was written in D++. Just goes to show you can do it in any language. Just tokenize your string and translate it.

Jeremy Boyd
  • 5,245
  • 7
  • 33
  • 57
3

As far as example code goes...

I don't know of any "simple" assemblers, though.

pioto
  • 2,472
  • 23
  • 37