7

I'm working on a small Haskell project that needs to be able to lex a very small subset of strictly formed English in to tokens for semantic parsing. It's a very naïve natural language interface to a system with many different end effectors than can be issued commands. I'm currently using Alex for this, but Alex relies on its lexicon to be statically compiled. The nature of the system is such that the number and even type of end effectors in the world can increase as well as decrease after compilation, and so I need to be able to add or remove viable tokens from the lexicon at runtime.

I've tried looking around for dynamic lexing solutions, and the closest I could get was this Dynamic Lexer Engine that doesn't look to have been updated since 2000.

I've been considering some techniques like using a less-high level approach (Attoparsec, perhaps), or even wiring up a recompilation hook for Alex and separating the lexer from the rest of the application.

Are there any well-known solutions for this sort of lexical analysis? I intend on working through Natural Language Processing for the Working Programmer eventually so I can take a less simplified approach, but currently a basically lexer is what I need.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Doug Stephen
  • 7,181
  • 1
  • 38
  • 46
  • 1
    How much of an issue is performance? I can imagine some very straightforward solutions, depending on whether you need a great deal of efficiency or not. – sclv Feb 07 '13 at 21:45
  • @sclv Performance needs to be near real-time (in the responsiveness sense, not the determinism sense). – Doug Stephen Feb 07 '13 at 22:52
  • 1
    That Dynamic Lexer Engine isn't automatically bad just because it hasn't been updated in a long time. Maybe it's already perfect, and doesn't need to be updated. :) – Robert Harvey Feb 07 '13 at 23:29
  • @RobertHarvey Haha, while that's technically true, I just don't want to hitch my wagon to a horse that may stop working in the future as GHC evolves. – Doug Stephen Feb 08 '13 at 00:44
  • I guess I misunderstood. Your question gives the impression that the lexer is a temporary solution, not a permanent one... – Robert Harvey Feb 08 '13 at 00:45
  • @RobertHarvey it's possible that I may abandon a lexer-parser architecture, but not guaranteed. If it turns out that a robust enough dynamic lexer can be tied in to a smart enough parser, it may be good enough for my needs. – Doug Stephen Feb 08 '13 at 00:48
  • How bad would it be to just use a decent map (i.e. maybe not `Data.Map` but maybe a hashmap, or a stringtrie?) If your universe of inputs is finite, I think you'd be surprised by the efficiency of this approach. – sclv Feb 11 '13 at 16:03

1 Answers1

4

CTK is an equivalent of parsec but for lexing. It supports adding new combinators dynamically.

Don Stewart
  • 137,316
  • 36
  • 365
  • 468