In a conventional language like Python you would be tempted to solve this problem with something like this:
result = []
for line in open('file.txt'):
line = re.replace(line, '//.*', '')
result.append(line)
In Prolog, you will instead find it simpler to write a full DCG for your input as if it were a grammar. Having a more powerful parsing framework right there in the core has sort of prevented Prolog from developing a large and complex suite of string- and character-banging functions. So I would expect that even if you did parse to strings, you would then be stuck again, but for want of a regular expression library or ways of slicing and dicing strings which just aren't there.
As with everything in Prolog, it's more wordy than you're probably used to, but there are advantages that are probably not obvious from the outset. Here's the code I came up with for your toy problem (which took me about 15 minutes.)
:- use_module(library(pio)).
:- use_module(library(dcg/basics)).
comment --> "//", string_without("\n", _).
comment --> [].
optarget(A) --> string(S), { atom_codes(A, S) }.
instruction(inst(Op, Target)) --> optarget(Op), " ", whites,
optarget(Target), whites, comment, "\n".
instructions([Inst|Rest]) --> instruction(Inst), instructions(Rest).
instructions([]) --> [].
This will parse your example into something like this:
?- phrase_from_file(instructions(Inst), "test.txt").
Inst = [inst(brz, 'END'), inst(sub, 'ONE'), inst(sta, 'SECOND'),
inst(lda, 'RESULT'), inst(add, 'FIRST'), inst(bra, 'LOOP')] .
You should not feel like you are "abusing" dcg/basics by using it for things that are not related to HTTP. The library was extracted some time ago because of its general usefulness.
- I'm using
whites
here to discard whitespace, but because it will succeed with nothing, you need an explicit space between the two optarget calls
- There are more interesting things you could do instead of
optarget//1
, like parse only your real instructions or only your real arguments, but I don't know what they are so you're getting atoms here
- When it turns out your instructions take more arguments, you can add additional
instruction//1
rules to handle them individually. That's probably what I would do, anyway
- If you realize a different representation would be more beneficial to downstream processing, it should be fairly easy to realize it by changing
instruction//1
or instructions//1