Convert escape sequences from user input into their real representation

Question

I'm trying to write an interpreter for LOLCODE that reads escaped strings from a file in the form:

VISIBLE "HAI \" WORLD!"

For which I wish to show an output of:

HAI " WORLD!

I have tried to dynamically generate a format string for printf in order to do this, but it seems that the escaping is done at the stage of declaration of a string literal.

In essence, what I am looking for is exactly the opposite of this question: Convert characters in a c string to their escape sequences

Is there any way to go about this?

`if(str[i] == '\\') { switch(str[++i]) { case 'a': printf("\a"); break; ... } }` Well, this seems to be the easiest way of going about doing things, but probably doesn't deal with all the escape characters. Is there a more elegant way? — peteykun, Feb 14 '13 at 10:29

score 3 · Accepted Answer · answered Sep 14 '13 at 14:23

It's a pretty standard scanning exercise. Depending on how close you intend to be to the LOLCODE specification (which I can't seem to reach right now, so this is from memory), you've got a few ways to go.

Write a lexer by hand

It's not as hard as it sounds. You just want to analyze your input one character at a time, while maintaining a bit of context information. In your case, the important context consists of two flags:

one to remember you're currently lexing a string. It'll be set when reading " and cleared when reading ".
one to remember the previous character was an escape. It'll be set when reading \ and cleared when reading the character after that, no matter what it is.

Then the general algorithm looks like: (pseudocode)

loop on: c ← read next character
  if not inString 
    if c is '"' then clear buf; set inString
    else [out of scope here]
  if inEscape then append c to buf; clear inEscape
  if c is '"' then return buf as result; clear inString
  if c is '\' then set inEscape
  else append c to buf

You might want to refine the inEscape case should you want to implement \r, \n and the like.

Use a lexer generator

The traditional tools here are lex and flex.

Get inspiration

You're not the first one to write a LOLCODE interpreter. There's nothing wrong with peeking at how the others did it. For example, here's the string parsing code from lci.

Convert escape sequences from user input into their real representation

1 Answers1

Write a lexer by hand

Use a lexer generator

Get inspiration