3

We are writing a python program that attempts to synthesize a (simple) haskell-function given input-output pairs. Throughout the run of the program, we generate haskell code and check its correctness against the user-supplied examples. Suppose we get as input "1 2" and expected output "3". We would (eventually) come up with the plus function. We would then run (\x y -> x + y) 1 2 in haskell and check if it evaluates to 3.

The way we currently do things is by running the following python code:

from subprocess import Popen, PIPE, STDOUT
proccess = Popen(f'ghc -e "{haskell_code}"', shell=True, stdout=PIPE, stderr=STDOUT) 
haskell_output = proc.stdout.read().decode('utf-8').strip('\n')

As neither of us is familiar with ghc, haskell, processes, or really anything to do with any of this, we were hoping someone could help us with preforming this task in a (much) more efficient manner, as this is currently very slow.

Additionally, we would like to be able to perform more than a single statement. For example, we would like to import Data.Char so that our function can use “toUpper”. However, the way we are currently doing this is by sending a single lambda function and the inputs appended to it, and we aren't sure how to add an import statement above that (adding "\n" did not seem to work).

To summarize, we would like the fastest (runtime) solution that would allow us to test haskell functions from python (where we don’t have the code for all haskell functions in advance or at one point in time, but rather test as we generate the code), while allowing us to use more than a single statement (for example, importing).

Apologies if any of this is trivial or stupid, any help would be highly appreciated.

CompareTwo
  • 115
  • 7
  • IIRC `ghc -e` is pretty restricted. You might be able to just use `Data.Char.toUpper` fully qualified, or write a whole file and `runghc` it. – melpomene Apr 13 '18 at 09:20
  • Using Data.Char.toUpper seems to solve the second problem. However, this would require us using the fully qualified name rather than simply "toUpper". Would using runghc be as fast, or faster, than ghc -e? Thank you for the quick response! – CompareTwo Apr 13 '18 at 09:24

1 Answers1

6

This seems like an odd thing to be doing.. but interesting none the less.

Two things come immediately to mind here. First is to use ghci repl instead of spawning a new process for every eval attempt. The idea is to stream your I/O into the ghci process instead of spawning a new ghc process for each attempt. The overhead of starting a new process for every eval seems to be quite performance killer. I'd usually go for expect, but since you want python, I'll call on pexpect:

import pexpect
import sys
from subprocess import Popen, PIPE, STDOUT
import time


REPL_PS = unicode('Prelude> ')
LOOPS = 100


def time_function(func):
    def decorator(*args, **kwargs):
        ts = time.time()
        func(*args, **kwargs)
        te = time.time()
        print "total time", (te - ts)
    return decorator


@time_function
def repl_loop():
    repl = pexpect.spawnu('ghci')
    repl.expect(REPL_PS)
    for i in range(LOOPS):
        repl.sendline('''(\\x y -> x + y) 1 2''')
        _, haskell_output = repl.readline(), repl.readline()
        repl.expect(REPL_PS)


@time_function
def subproc_loop():
    for i in range(LOOPS):
        proc = Popen('''ghc -e "(\\x y -> x + y) 1 2"''', shell=True, stdout=PIPE, stderr=STDOUT) 
        haskell_output = proc.stdout.read().decode('utf-8').strip('n')
        # print haskell_output


repl_loop()
subproc_loop()

This gave me a very consistent >2x speed boost.

See pexpect doc for more info: https://github.com/pexpect/pexpect/

The second immediate idea would be to use some distributed computing. I don't have the time to build full blown demo here, but there are many great examples already living in the land of the internet and SO. The idea is to have multiple "python + ghci" processes reading eval attempts from a common queue then push the results to a common eval attempt checker. I don't know much about ghc(i) but a quick check shows that ghci is a multithreaded process so this may require multiple machines to pull off, each machine attempting different subsets of the the attempts in parallel.

Some links that may be of interest here:

How to use multiprocessing queue in Python?

https://docs.python.org/2/library/multiprocessing.html

https://eli.thegreenplace.net/2012/01/24/distributed-computing-in-python-with-multiprocessing

smassey
  • 5,875
  • 24
  • 37