3

One of the first things that got me into wanting to program was to create a multiplayer text game. I was scared away from the notion though when I realized, at least at the time for me, how complicated writing a smart parser would be.

So now I'm back to thinking about it, and I've tried to do lots of research on the subject. It turns out that it seems to be a lot more involved than I think, and I've stumbled across terms such as lexing and tokenizing and parsing, only the latter of which I knew of before. I figured the field of lexical analysis is what I wanted to look for.

So instead of trying to create my own lexer and parser which I've read to be very difficult and error prone, and most people instruct to steer away from, I thought I would find a good lexer and parser generator to use which will supposedly do all the heavy lifting for me and I can just focus on the grammar that I want. I've also heard lots of people say that individuals looking to do such a thing should simply use Inform.

Sure, I guess Inform is cool, but C# is my language of choice and I like the freedom that it allows me over what I perceive Inform to offer. I'm more interested in creating all the components and framework of the multiplayer text game than I am in any one particular final result, and so I like the idea of using a standard programming language best.

I've been trying to find a good lexer/parser generator for C# for a while now, not really satisfied with all that seems to be offered in terms of the comments people give.

antlr for C# seems to be underdeveloped and mostly an afterthought. I've tried understanding GPLEX and GPPG but as of right now they are far too confusing for me, despite reading a lot of the documentation and trying to read up a lot on lexing in general.

I have a lot of the concepts in my head about the whole process of lexing, but when faced with a lexer and a parser, I guess I don't really know how these are supposed to be meshed into my actual code.

I want to build a simple English like grammar with noun phrases and verb phrases, and be able to have lists of nouns and verbs that can be dynamically added and ready from a database as the game is developed and expanded upon.

I guess I'm feeling like I'm turning up short on the results of the research I've been trying to do with this subject.

To be honest, the idea of creating my own bastardized notion of a lexer and parser based on what I've researched seems far more appealing at this point than using any lexer/parser generator that I've read about.

svick
  • 236,525
  • 50
  • 385
  • 514
Cowman
  • 678
  • 7
  • 25
  • One tip: Just try writing your own from scratch, if this is a learning exercise. Start simple, expand. One thing I found helpful for more complex phrases: Give objects a list of words that represent what they are, and give the words different priorities. – uliwitness Jan 06 '17 at 19:01
  • E.g. Inform and TADS have nouns and adjectives attached to an object. 'adjectives' are not necessarily that, but just 'less important' parts of the word. That way, if the user types `plant pot plant in plant pot` you know the first one must be a verb, then `pot plant` and `plant pot` are made up of the same two words, so you see which object has the noun as a later word than the other. the object `pot plant` is an `adjective(pot),noun(plant)` (a plant in a pot), so that has priority in the first case. `plant pot` is `noun(pot),adjective(plant)` (a pot for plants): matches second case better. – uliwitness Jan 06 '17 at 19:03

2 Answers2

10

First, some meta-advice. This is maybe not a great question for StackOverflow since it is not about a specific technical problem. You might consider asking questions like this on Programmers.StackExchange.com instead.

To address your question: I agree with tallseth; it depends on what skill you want to improve here. People occasionally ask me why I don't build a boat, since I am fairly handy and like sailing. Because you should only build a boat if you like having an unfinished boat in your garage for two years. If you want to sail, sail. If you want to build a boat, build a boat. But don't build a boat because you want to sail. If you want to write a text adventure, learn Inform7; it is the most incredibly awesome tool for making text adventures the world has ever seen. If you want to learn how to make a lexer and parser, then don't mess around with parser generators. Make a lexer and parser.

Parser generators are great if you are prototyping up a new programming language and you want to get going quickly. But programming language grammars can have really nice properties that make them amenable to automatic parser generation in a way that natural languages do not. You're likely to spend as much time fighting with the parser generator as you spend building the grammar if you go with a parser generator approach.

Since you're a beginner at this, I suggest that you recapitulate the history of text adventures. Start by writing a lexer and parser that can parse two-word commands: "go north", "get sword", "drop brick", and so on. That will give you enough flexibility that you can then start solving other problems in text adventure design: how to represent objects and locations. Start small; if you can make a two-room game where the player can pick up and put down objects, you are well on your way.

In short, I strongly encourage you to follow your instinct; writing your own lexer and parser is super fun and actually not that hard if you start small and work your way up.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • I didn't even really know about Programmers.StackExchange. Thank you, and thank you for the great answer. Inform7 certainly isn't what i want here, since I'm not interested in the end result so much as I am the process of getting there. – Cowman Jan 26 '13 at 22:32
  • I also appreciate your advice. I have been trying to make a text-based adventure game from scratch, and had been working on the parser and underlying structure for some time, but still had no idea what I wanted the game to be about. It's obvious to me after reading this, though, that I really wanted to make the structure and the parser in the first place, not the game. – Brian Peterson Mar 20 '13 at 23:54
3

It depends on what you want to get out of the project.

If this is a learning project to do on your spare time, rolling your own tokenizer and interpreter is a good solution. While it is hard to do this perfectly, it is fairly easy to make a sufficient parsing layer. I'd recommend making the most minimal parsing layer you can to start with, and incrementally improving it as you add richness to the game. This is a good type of problem to practice TDD and design patterns with, if that interests you. Using those practices will make it easier to incrementally make improvements.

On the other hand, if you will not get a lot of value from that exercise (you're on a tight timeline, this is for real work, you just don't care about the parsing layer, etc), then I suggest making do with an existing parser. If you encapsulate a third party parser behind an abstraction, you can change your mind later about writing your own, or about what parser to use.

You probably already saw it, but this SO question looks to have a nice list of parsers for c#.

Community
  • 1
  • 1
tallseth
  • 3,635
  • 1
  • 23
  • 24
  • I guess one of the things that scares me about writing my own tokenizer is that all the others I've seen operate on such a low level, using finite state machines to do what they do. I don't even know if such a thing is necessary if I'm creating a tokenizer that works only for my project. When I first initially thought of making a tokenizer for my project, I figured it was about splitting my string on spaces and looking up each word in a Dictionary dictionary to figure out if the word "jump" was a noun, or a verb or whatever, and just creating an ordered list of those tokens. – Cowman Jan 26 '13 at 22:36
  • 2
    @Cowman: That is a perfectly acceptable solution for a basic text adventure tokenizer. Remember, programming language tokenizers need to recognize that `-0.124E5` is a legal float and all that other stuff that makes tokens really complicated. Your tokenizer will be the least of your worries; it's the *semantic analysis* that's going to be tricky when you start to parse a non-trivial language in a virtual world where nouns and pronouns are ambiguous. – Eric Lippert Jan 26 '13 at 23:09