I tried to make a really small interpreter in Golang yet it is 5x slower than python

Question

I was trying to create a basic interpreter in Golang. It has just a few instructions (set,jump,add,greater). I tried to run a script which counts to a hundred million and it is already 10 times slower than python. I am not sure why this is happening and would really appreciate someone helping me out.

Here is the source code of the interpreter.

package main

import (
    "fmt"
    "os"
    "strconv"
    "strings"
)



func main() {
    symbol_table := make(map[string]float64)
    dat, err := os.ReadFile("alu.vtl")
    if (err!=nil) {
        return
    }
    code:=string(dat)
    codex := strings.Split(code, "\n")
    i := 0
    for {
        x := codex[int(i)]
        if (x=="" || (string(x[0])=="/" && string(x[1])=="/")) {
            i = i + 1
            if !(i < len(codex)) {
                break
            }
            continue
        }
        args := strings.Split(strings.Split(x, " ")[1], ",")
        switch opcode := strings.Split(x, " ")[0]; opcode {
        case "set":
            if err == nil {
                num,err:=strconv.ParseFloat(args[1], 64)
                if (err!=nil) {
                    fmt.Println("Ye kya krdiya")
                    return
                }
                symbol_table[args[0]] = num
            } else {
                symbol_table[args[0]] = symbol_table[args[1]]
            }
        case "jump":
            if (symbol_table[args[1]]==1.0) {
                i = int(symbol_table[args[0]])
                continue
            }
        case "add":
            symbol_table[args[2]] = symbol_table[args[0]]+symbol_table[args[1]]
        case "greater":
            res:=float64(0)
            if (symbol_table[args[0]]>symbol_table[args[1]]) {
                res=1
            }
            symbol_table[args[2]] = res
    }
    i = i + 1
    if !(i < len(codex)) {
        break
    }
    }
    fmt.Println(symbol_table)
}

Here is the file i was trying to run

set final,10000000
set count,0
set line_jump,4
set add_count,1
add count,add_count,count
greater final,count,c
jump line_jump,c

I was very shocked when it was so slow as I was expecting it to be atleast a bit faster than python.

I would highly appreciate someone please helping me out on why this is so slow.

Finally after making the suggested changes, my interpreter is now 7x faster than python.

package main

import (
    "fmt"
    "os"
    "strings"
    "strconv"
)

// func do_nothing(x interface{}) {}

func contains_int64(a int64, list []int64) map[string]int64 {
    res:=make(map[string]int64)
    for i, b := range list {
        if b == a {
            res["index"]=int64(i)
            res["contains"]=1
            return res
        }
    }
    res["contains"]=0
    res["index"]=-1
    return res
}

func contains_str(a string, list []string) map[string]int64 {
    res:=make(map[string]int64)
    for i, b := range list {
        if b == a {
            res["index"]=int64(i)
            res["contains"]=1
            return res
        }
    }
    res["contains"]=0
    res["index"]=-1
    return res
}

func main() {
    dat, err := os.ReadFile("alu.vtl")

    const_symbol_table:=make([]int64,0)
    lookup_symbol_table:=make([]string,0)
    symbol_table:=make([]int64,0)

    if (err!=nil) {
        return
    }
    code:=string(dat)
    codex := strings.Split(code, "\n")

    bytecode:=make([][]int64,0)
    var_len:=0
    const_len:=0

    for i := 0; i < len(codex); i++ {
        args:=strings.Split(strings.Split(codex[i], " ")[1], ",")
        current_byte_code:=make([]int64,0)
        switch opcode:=strings.Split(codex[i], " ")[0]; opcode {
        case "set":
            num,err:=strconv.ParseInt(args[1],10,64)
            if (err!=nil) {
                fmt.Println("Unable to parse number")
                return
            }
            contains_res:=contains_int64(num,const_symbol_table)
            contains1_res:=contains_str(args[0],lookup_symbol_table)
            var variable_index int;
            if (contains1_res["contains"]==1) {
                variable_index=int(contains1_res["index"])
            } else {
                lookup_symbol_table = append(lookup_symbol_table, args[0])
                symbol_table = append(symbol_table, 0)
                variable_index=var_len
                var_len+=1
            }
            if (contains_res["contains"]==1) {
                current_byte_code = append(current_byte_code, 0,int64(variable_index),contains_res["index"])
            } else {
                const_symbol_table = append(const_symbol_table, num)
                current_byte_code = append(current_byte_code, 0,int64(variable_index),int64(const_len))
                const_len+=1
            }
            bytecode = append(bytecode, current_byte_code)
        case "refset":
            contains_res:=contains_str(args[0],lookup_symbol_table)
            contains1_res:=contains_str(args[1],lookup_symbol_table)
            current_byte_code = append(current_byte_code, 1,contains_res["index"],contains1_res["index"])
            bytecode = append(bytecode, current_byte_code)
        case "jump":
            contains_res:=contains_str(args[0],lookup_symbol_table)["index"]
            contains1_res:=contains_str(args[1],lookup_symbol_table)["index"]
            current_byte_code = append(current_byte_code, 2,contains_res,contains1_res)
            bytecode = append(bytecode, current_byte_code)
        case "equals","add","greater":
            contains_res:=contains_str(args[0],lookup_symbol_table)["index"]
            contains1_res:=contains_str(args[1],lookup_symbol_table)["index"]
            contains2_res:=contains_str(args[2],lookup_symbol_table)["index"]
            var int64_opcode int64;
            if (opcode=="equals") {int64_opcode=3}
            if (opcode=="greater") {int64_opcode=4}
            if (opcode=="add") {int64_opcode=5}
            current_byte_code = append(current_byte_code, int64_opcode,contains_res,contains1_res,contains2_res)
            bytecode = append(bytecode, current_byte_code)
        }
    }
    fmt.Println(symbol_table,const_symbol_table,bytecode)
    for i := 0; i < len(bytecode); i++ {
        current_byte_code:=bytecode[i]
        switch opcode:=current_byte_code[0]; opcode {
        case 0:
            symbol_table[current_byte_code[1]]=const_symbol_table[current_byte_code[2]]
        case 1:
            symbol_table[current_byte_code[1]]=symbol_table[current_byte_code[2]]
        case 2:
            if (symbol_table[current_byte_code[1]]==1) {
                i=int(const_symbol_table[current_byte_code[2]])-1
            }
        case 3:
            if (symbol_table[current_byte_code[1]]==symbol_table[current_byte_code[2]]) {
                symbol_table[current_byte_code[3]]=1
            } else {
                symbol_table[current_byte_code[3]]=0
            }
        case 4:
            if (symbol_table[current_byte_code[1]]>symbol_table[current_byte_code[2]]) {
                symbol_table[current_byte_code[3]]=1
            } else {
                symbol_table[current_byte_code[3]]=0
            }
        case 5:
            symbol_table[current_byte_code[3]]=symbol_table[current_byte_code[1]]+symbol_table[current_byte_code[2]]
        }
    }
    fmt.Println(symbol_table,const_symbol_table)
}

Input code

set jump_count,3
set final,1000000000
set increment,1
set is_bigger,0
set line_jump,5
add increment,jump_count,jump_count
greater final,jump_count,is_bigger
jump is_bigger,line_jump

Python is compiled to bytecode, which is then interpretted. That bytecode avoids much of your string handling overhead. — MatBailie, Jun 11 '23 at 05:30
How do you run your code? How do you measure how "fast" it is? — Volker, Jun 11 '23 at 05:32
Your expectation is just way off. Cpython is a fairly advanced bytecode interpreter with 30 years of work put into its performance. Yours is a simplistic string interpreter with zero optimization. — hobbs, Jun 11 '23 at 05:33
I am measuring the runtime of both files. I believe Cpython is fast but i did not expect my interpreter to be this slow. @hobbs — Aarav Dayal, Jun 11 '23 at 05:46
@AaravDayal: Most of your language's operations (add, greater, ...) could be done by a CPU in a single instruction. You'd want to estimate how many CPU instructions the interpreter consumes just to do "one CPU instruction worth" of work (e.g. one addition). To start, I'd estimate that the `strings.Split(...` is around 100 instructions all by itself (assuming it's using array slices internally to avoid memory management overhead). That's already enough to estimate that the interpreter will be several hundred times slower. — Brendan, Jun 11 '23 at 06:46
@AaravDayal: For Python performance; it depends what you're comparing against. Cython compiles to native machine code and PyPy does JIT, so they'll be relatively fast. Cpython is a reference implementation (designed to be a correct reference for how the language should behave and not designed to be fast) so it's slow. However, even for Cpython the "raw text" is pre-parsed into byte-code and then the byte-code is interpreted (to avoid dealing with text strings when the program is running), so Cpython's "slow" is probably an order of magnitude faster than your interpreter. — Brendan, Jun 11 '23 at 06:54
See this question on how to efficiently [scan a file line by line](https://stackoverflow.com/questions/8757389/reading-a-file-line-by-line-in-go) — colm.anseo, Jun 11 '23 at 13:21
@Brendan What do you mean by "to avoid dealing with text strings when the program is running"? You would still have to deal with strings wouldn't you? Like even if I convert it into some kind of a bytecode, wouldn't I still have to deal with the fact the a variable has a value of type string? Sorry, I might have misunderstood what you meant. Kindly help me understand It a bit better. — Aarav Dayal, Jun 19 '23 at 15:46
@AaravDayal Sure if your language included strings (which it doesn't seem to at the moment), your interpreter would have to deal with strings in the cases where the program actually involves strings. But what Brendan was talking about was the fact that you're using string operations to check what the current operation is, what the arguments are etc. So executing any instruction involves three calls to `Split` and one switch on the first string. And many instructions also involve a conversion from string to float every time they're executed. — sepp2k, Jun 19 '23 at 15:55
If you used bytecodes, all the splits and conversions would happen only once at parse time (in fact, an efficient parser would not involve any splits at all, but optimizing the parser probably won't have that big of an impact - the important thing is to only have to do the parsing once). And then the switch would work on an integer or enum, not a string. Even the variable lookup can be done using integer IDs rather than variable names as strings. — sepp2k, Jun 19 '23 at 15:57
@sepp2k Thank you so much for guiding me and clarifying me about what improvements can be done! — Aarav Dayal, Jun 19 '23 at 19:12
@sepp2k With your help I was able to make my interpreter 7 times faster than python. Thank you so much for making me understand what was required to be done! — Aarav Dayal, Jun 20 '23 at 16:47

I tried to make a really small interpreter in Golang yet it is 5x slower than python

0 Answers0