Is there an easy way to get lexing and parsing to run concurrently when using fslex and fsyacc?
-
What is the purpose of concurrently running fslex and fsyacc? Usually parsing is done after lexing as the output of lexing is the input of parsing. The lexing phase builds tokens from an input stream then the parser usually checks the correctness of the tokens with respect to a grammar. So I don't know why should one run them concurrently? Can you be a bit more precise? p.s.: I'm not critisizing your question. I'm sure you have valid reasons to ask it. I'm just curious. :) – MSX Jan 23 '14 at 12:25
-
2@MSX: Performance. Lexing and parsing are often performance critical (IME) and often take ~50% of the total time each, so lexing on one core and concurrently parsing the lexical tokens on another core gives a potential 2x speedup. Simiarly, compression/decompression and disk IO can both be performance critical and can be done concurrently. Despite this potential there appears to be no way to do this using F# and/or .NET without doing a huge rewrite. – J D Jan 24 '14 at 10:04
-
Right. Didn't think about that. I just voted up your question. As far as I know this cannot be done in fslex and fsyacc. – MSX Jan 24 '14 at 20:24
-
@JonHarrop I'm currently doing that huge rewrite you mentioned: see my [fsharp-tools](https://github.com/jack-pappas/fsharp-tools) project. I'm currently working to get my new tools to par with fslex and fsyacc, and once that's done I'm planning to implement new backends (for generating the F# code implementing the lexer/parser). If you're still interested in this, please open a Github issue on the project so we can discuss further. – Jack P. Mar 09 '14 at 14:56
-
While I am not completely sure, I believe that fsyacc-generated parsers actually only calls the associated lexer when needed, i.e., the lexer runs only when the parser needs a new token. – Yet Another Geek Apr 21 '14 at 12:12
-
Not sure if this is what you're asking but there is a similar answer here:http://stackoverflow.com/questions/6532720/lexing-and-parsing-css-hierachy – Brendan Apr 24 '14 at 20:51
-
No answer? Really? [Anyone? Anyone?](https://www.youtube.com/watch?v=uhiCFdWeQfA) [Bueller? Bueller?](https://www.youtube.com/watch?v=f4zyjLyBp64) – joce Apr 28 '14 at 03:54
-
1Are you asking for 1) the ability to use multiple processors to parse a single input file, or 2) the ability to use multiple processors to parse separate files, where each file is parsed on a single processor? – Sam Harwell Jun 02 '14 at 16:29
-
@280Z28: I am asking for the ability to lex and parse concurrently, e.g. lex on one core and pass lexical tokens to a second core for parsing. – J D Jun 03 '14 at 17:29
-
2@jonharrop: As far as I know lexing and parsing are - for most tasks - not the time critical steps in a compiler/program. They both work in **O(n)**, with **n** the size of the input while for instance semantical analysis, liveness analysis and tiling take way more time. – Willem Van Onsem Jun 09 '14 at 08:45
-
@CommuSoft: That's great but I am not doing instance semantical analysis, liveness analysis or tiling because I am not writing a compiler. – J D Jun 09 '14 at 13:13
-
It's still not clear if it's possible to split work among multiple fslex instances by splitting the inputs. This would be sensible, producing a "chunky" parallelization possibility, instead of a "chatty" one. – GregC Jun 19 '14 at 16:42
-
@GregC: I am not sure what you mean by "splitting the inputs" to fslex. The input is a single stream of characters processed by a state machine that is embarrassingly sequential. I am interested in lexing and parsing concurrently. – J D Jun 20 '14 at 17:26
-
@JonHarrop: All I am saying is that maybe this should not be parallelized, because parallelization opportunities might exist at a higher level of abstraction. For example, don't parallelize C# compiler's lex/yacc stage, but parallelize working with several C# source files. – GregC Jun 21 '14 at 10:42
-
Make the lexer stream tokens to an RX observable, let the parser be a subscriber of the lexerstream ...profit :) jokes aside, it is an interesting topic and should deffo be possible to acheive. never heard of any tool for it though. – Roger Johansson Jun 23 '14 at 11:57
1 Answers
First of all in real case lexing and parsing is time critical. Especially if you need to process tokens before parsing. For example -- filtering and collecting of comments or resolving of context-depended conflicts. In this case parser often wait for a lexer.
The answer for a question. You can run lexing and parsing concurrently with MailboxProcessor.
Core of idea. You can run lexer in mailBoxProcessor. Lexer should produce new tokens, process and post them. Lexer often faster than parser, and sometimes it should wait for a parser. Parser can receive next token when necessary. Code provided below. You can modify timeouts, traceStep to find optimal for your solution.
[<Literal>]
let traceStep = 200000L
let tokenizerFun =
let lexbuf = Lexing.LexBuffer<_>.FromTextReader sr
let timeOfIteration = ref System.DateTime.Now
fun (chan:MailboxProcessor<lexer_reply>) ->
let post = chan.Post
async {
while not lexbuf.IsPastEndOfStream do
lastTokenNum := 1L + !lastTokenNum
if (!lastTokenNum % traceStep) = 0L then
let oldTime = !timeOfIteration
timeOfIteration := System.DateTime.Now
let mSeconds = int64 ((!timeOfIteration - oldTime).Duration().TotalMilliseconds)
if int64 chan.CurrentQueueLength > 2L * traceStep then
int (int64 chan.CurrentQueueLength * mSeconds / traceStep) |> System.Threading.Thread.Sleep
let tok = Calc.Lexer.token lexbuf
// Process tokens. Filter comments. Add some context-depenede information.
post tok
}
use tokenizer = new MailboxProcessor<_>(tokenizerFun)
let getNextToken (lexbuf:Lexing.LexBuffer<_>) =
let res = tokenizer.Receive 150000 |> Async.RunSynchronously
i := 1L + !i
if (!i % traceStep) = 0L then
let oldTime = !timeOfIteration
timeOfIteration := System.DateTime.Now
let seconds = (!timeOfIteration - oldTime).TotalSeconds
res
let res =
tokenizer.Start()
Calc.Parser.file getNextToken <| Lexing.LexBuffer<_>.FromString "*this is stub*"
Full solution is available here: https://github.com/YaccConstructor/ConcurrentLexPars In this solution we only demonstrate full implementation of described idea . Performance comparison is not actual because semantic calculation is very simple and no tokens processing.
To find out performance comparison result look at full report https://docs.google.com/document/d/1K43g5jokNKFOEHQJVlHM1gVhZZ7vFK2g9CJHyAVtUtg/edit?usp=sharing Here we compare performance of sequential and concurrent solution for parser of T-SQL subset. Sequential: 27 sec, concurrent: 20 sec.
Also we use this technique in production T-SQL translator.

- 277
- 1
- 9