3

I need to be able to handle data that can look like:

set setting1 "bind button_x +actionslot1;bind button_y \" bind button_x +stance \" "

bind button_a jump

set setting2 1 1 0 1

toggle setting_3 " \"value 1\" \"value 2\" \"value 3\" "

These are what some of the commands for the console of a game look like, and I'm trying to write an emulator of sorts that will interpret the code the same way the game will.

The first thing that comes to mind is regex, but I'm not sure it's the best option. For example, when matching for the value of a setting, I might trying something like /set [\w_]+ "?(.+)"?/, but the wildcard matches the ending quote because it's not lazy, but if I make it lazy, it matches the quote inside the value. If I make it greedy and stop it from matching the quotes, it won't match the escaped quotes in the values.

Even if there are possible regex solutions, they seem like the wrong option. I had asked before about how programs like Visual Studio and Notepad++ know which parentheses and curly braces matched, and I was told there was something similar to regex in some ways but much more powerful.

The only other thing I can think of is to go through the lines of code character by character and use booleans to determine that state of the current character.

What are my options here? What do game developers use to handle console commands?

edit: Here's another possible command which strongly deters me from using regex:

set setting4 "bind button_a \" bind button_b "\" set setting1 0 \" " \" "

The commands include not just escaped quotes, but quotes of the manner "\" inside escaped quotes.

mowwwalker
  • 16,634
  • 25
  • 104
  • 157
  • 2
    This problem is ancient, and as far as programming languages are concerned, pretty much solved. There's plenty of good ways to parse stuff. Actually, there's plenty of ready-made embeddable programming languages - just use one of those if you want to add scripting to your game. That's what many professional games do (Lua is quite popular). Parsing *is* a really interesting and broad topic, but building your own language isn't practical. Besides, chances are you'll do it badly and torture your users. –  Jan 12 '12 at 20:33
  • The syntax above can be processed by any "shell command parser" worth its salt. Generally it is *not* done with a monolithic regular expression as *correct* processing of escaped "s can be problematic. Some regular expression implementations -- e.g. Perl -- *do* support the required functionality to perform this correctly, but it is generally outside the realm of Regular Languages for the generalized case, and can greatly increase the complexity of the regular expression. –  Jan 12 '12 at 20:41
  • Regarding your edit: Better forget that and keep the lexical structure simple. While it would be certainly possible *if you made up definite rules* on what's a string literal, it makes both tools and human comprehension needlessly hard. –  Jan 12 '12 at 20:43
  • @pst, where do I learn how to write something that will do it? – mowwwalker Jan 12 '12 at 20:45
  • @Walkerneo Lecture note from an introductory compilers course :-) This problem is broken up into two (or more) phases -- lexical analysis (where the input it turned into a stream of tokens, e.g. [word] [num] [string] [string]), and parsing (where something is done with the tokens). –  Jan 12 '12 at 20:48
  • http://stackoverflow.com/questions/197233/how-to-parse-a-command-line-with-regular-expressions , http://stackoverflow.com/questions/900087/c-sharp-command-line-parsing-of-quoted-paths-and-avoiding-escape-characters –  Jan 12 '12 at 20:53

2 Answers2

2

I don't want to keep you on the path of regex -- you are correct that there are non-regex solutions that may be more appropriate (I just don't know what they are). However, here is one possible regex that should fix your quotes issue:

/set [\w_]+ "?((\\"|[^"])+)"?/

I changed .+ to (\\"|[^"])+. Basically it's matching occurrences of \" OR of anything that isn't a quote. In other words, it will will match anything except quotes that aren't escaped.

Again, if someone can suggest a more sophisticated non-regex solution, you should strongly consider it.

Edit: The updated example you've provided breaks this solution, and I think it would break any regex solution.

Edit 2: Here is a C# string version of your regex. It uses @ to tell the compiler to treat the string as a verbatim literal, which means it ignores \ as an escape character. The only caveat is that in order to represent " in a verbatim literal you have to type it as "", but it's still better than having slashes everywhere. Given the prevalence of escape sequences in regexes, I recommend using verbatim literals anywhere that you have to type a regex in a string.

string pattern = @"set [\w_]+ ""?((\\""|[^""])+)""?"
ean5533
  • 8,884
  • 3
  • 40
  • 64
  • 1
    You realize regex is a poor way to do it but you've never heard of more sophisticated parsing algorithms? I mean, it's a good start, I'm just surprised. –  Jan 12 '12 at 20:35
  • 1
    @delnan: Should I apologize for my ignorance? – ean5533 Jan 12 '12 at 20:36
  • @ean5533, I updated the question. I forgot to mention that commands can have none-escaped quotes inside of quotes. – mowwwalker Jan 12 '12 at 20:39
  • I don't see why you should. I'm mostly surprised and wanted to check if I got something wrong. The stances "regular expressions solve everything" and "regular expressions are cruft, write a real parser" are far more common. –  Jan 12 '12 at 20:40
  • 1
    @delnan What can I say? I recognize that better solutions exist, but I've never needed to solve a task like this myself so I've never bothered to learn. I just wanted to chip in my support. – ean5533 Jan 12 '12 at 20:43
  • @Walkerneo If I'm not mistaken, your updated example proves that the text you're parsing is non-regular, and thus can't be parsed by a regular expression. – ean5533 Jan 12 '12 at 20:44
  • 1
    @ean5533: given that your answer history reflects knowledge of C#, I would recommend looking at http://stackoverflow.com/questions/4396080/antlr-3-3-c-sharp-tutorials. The truth is that MOST parsing problems you will find are simple data files, which usually have trivial grammars compared to programming languages. It is unfortunate that most tutorials on parsers focus on complex grammars (typically attached to a compilers course). But a little knowledge on parsing can save you a lot of hand-coding work. Most knowledge of one parsing tool can carry over to other tools/libraries. – ccoakley Jan 12 '12 at 21:07
  • I've decided to just use this and say hell with the quotes inside escaped quotes. The lexers and parsers and things are what I should have used, but there's too much to learn to use them. – mowwwalker Jan 12 '12 at 21:15
  • @ccoakley So I kind of lied; I actually have worked with parsers before in college. We wrote some simple grammars using Bison/YACC to parse math problems like the ones in your link. But that was 4 years ago and it was a skill I never used, thus it was all forgotten. Your link brings it all back to mind though. Thanks for sending it my way. – ean5533 Jan 12 '12 at 21:16
  • @ean5533, Is there a way to clean up how the regex pattern looks in C#? I don't want to have to escape all my quotes and backslashes because the original meaning of the pattern ends up being lost to me. – mowwwalker Jan 12 '12 at 21:23
2

I would suggest you read about Lexical Analysis , this is the process of tokenizing your text using a grammar. I think it will help you with what you are trying to do.

Aviram Segal
  • 10,962
  • 3
  • 39
  • 52
  • Do you have any none-wikipedia links I can read? Where do I learn about lexers? – mowwwalker Jan 12 '12 at 20:42
  • @Walkerneo Not sure what programmng language you are going to use, different leers use different grammars/methods. I would read a little just to understand the concept, find a lexer of my choice and read its documentation. I mostly develop in java and used JFlex and JavaCC in the past. – Aviram Segal Jan 12 '12 at 20:46
  • I'm using C#. Is a lexer something that's generally written custom, or is something like jquery that other people work on? – mowwwalker Jan 12 '12 at 20:48
  • @Walkerneo: Unless the language you implement happens to have a complete, re-usable, publicly available, free-standing lexer (and that's very rare, and not possible if you define the language yourself), you'll have to roll your own. Lexers are highly input-language-specific and not worth much on their own, so nobody bothers. There are lexer generaters though - those automate the hardest, most annoying and least rewarding parts of the parser. Using one of those is highly recommended. –  Jan 12 '12 at 20:51
  • Check this question it will help you [C#/.NET Lexer Generators](http://stackoverflow.com/questions/172189/c-net-lexer-generators) – Aviram Segal Jan 12 '12 at 20:51