0

I was wondering what the most efficient way of parsing strings would be for protocols like HTTP, FTP, SMTP, IMAP, IRC, etc. where communication is done by sending information to a server, and reading the response.

For example, let's say I would like to parse a typical IRC message.

    PING irc.example.com

What I am doing right now is dividing the response string into tokens, and iterating through them. If the token is "PING", my program calls the pong function. However, at the moment, "parsing" these strings merely consists of a bunch of strcmp()s.

I am curious for any alternative, more efficient ways of 'parsing' data (I was thinking something like a Map for tokens so my program can easily look it up).

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • 1
    You can [parse HTML using regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). (SCNR) – sbi May 08 '11 at 06:39
  • any reason you're not using `operator==` rather than `strcmp` ? Also a `std::map` would make code much more readable in all likelihood. – Matthieu M. May 08 '11 at 10:19

2 Answers2

0

Define a grammar for it, or simply make an automata that detects your tokens. Example in this post.

Community
  • 1
  • 1
atoMerz
  • 7,534
  • 16
  • 61
  • 101
0

Depending on how much you want to support, you've got a few options. At the first level is simple tokenizing like what you're doing. This only works for a very limited set of commands. Next up you have regular expressions which may give you a bit more flexibility. Finally you've got a full blown grammar as suggested, which would allow for the greatest flexibility.

The complexity of each of these is bigger than the last.

Chris Eberle
  • 47,994
  • 12
  • 82
  • 119
  • I disagree with regular expression coming second, writing a descendant parser is not so hard (without having a full blown grammar). – Matthieu M. May 08 '11 at 10:19