0

I am writing a Prolog program for tokenization. Currently, I am able to get a list of ASCII code but I just don't know how to tokenize them to return a list of tokens.

For example, if I have:

[105,110,116,32,105,110,116,32,97,32,13,10,105,110,116,32],

how do I obtain: [int,int,a,int]?

I know the key is to read list by '32' and separate everything before 32 and append [105,110,116] to become 'int', I am new to list so I am not familiar with Prolog. Any help?

false
  • 10,264
  • 13
  • 101
  • 209
zihaow
  • 312
  • 2
  • 9
  • 23
  • 1
    Just a general remark: To process text, you can also use lists of characters. They are much more readable. In your case: `[i,n,t,' ',i,n,t,' ',a,' ','\r','\n',i,n,t,' ']`. See [this](http://stackoverflow.com/a/8269897/772868) for more! – false Mar 13 '16 at 16:19
  • Is there any predicate for appending ' ' to special characters like '?' ?@false – zihaow Mar 18 '16 at 00:11
  • I don't get your question. A list of characters with `?` is `"?"` or `[?]`. – false Mar 18 '16 at 10:17

1 Answers1

1

Basic knowledge about DCG woudl be usefull here. Let say you get ascii code using read_lines_to_codes/2 i.e read_lines_to_code(user_input,X). so with input {} you recive X = [123, 125]. so with DCG defined like that

lekserr(Tokens) -->
(   ( "{", !, {Token = tkLbrabce }
    ;    "}", !, {Token =  tkRBrace}
    ;    "int", !, {Token = tkInt}
    ),
    !,
    {Tokens = [Token| TokList]},
    lekserr(TokList)
;   [],
    {Tokens = []}
).

when you do

read_line_to_codes(user_input,X), phrase(lekserr(Y),X).
|    int
X = [105, 110, 116],
Y = [tkInt].

It's just brief and i hope you find it usefull, consider adding clause to ignore whitespaces.

whd
  • 1,819
  • 1
  • 21
  • 52