I have a lexer and parser, built with sedlex and menhir in OCaml, to parse spreadsheet formulas.
The following part of the lexer defines regular expressions for the path+workbook+worksheet part before a reference. For instance, 'C:\Users\Pictures\[Book1.xlsx]Sheet1'!
of ='C:\Users\Pictures\[Book1.xlsx]Sheet1'!A1:B2
.
let first_Latin_identifier_character = [%sedlex.regexp? ('a'..'z') | ('A'..'Z') ]
let path_identifier_character = [%sedlex.regexp? first_Latin_identifier_character | decimal_digit | '_' | '-' | ':' | '\x5C' (* \ *) | ' ' | '&' | '@']
let file_identifier_character = [%sedlex.regexp? first_Latin_identifier_character | decimal_digit | '_' | '-' | ' ' | '.']
let file_suffix = [%sedlex.regexp? ".xls" | ".xlsm" | ".xlsx" | ".XLS" | ".XLSM" | ".XLSX" | ".xlsb" | ".XLSB"]
let sheet_identifier_character_in_quote = [%sedlex.regexp? Compl ('\x3A' | '\x5C' | '\x2F' | '\x3F' | '\x2A' | '\x5B' | '\x5D' | '\x27')]
let sheet_identifier_character_out_quote = [%sedlex.regexp? Compl ('\x3A' | '\x5C' | '\x2F' | '\x3F' | '\x2A' | '\x27' | '\x28' | '\x29' | '\x2B' | '\x2D' | '\x2F' | '\x2C' |'\x3D' | '\x3E' | '\x3C' | '\x3b')]
let lex_file = [%sedlex.regexp? (Star path_identifier_character), '[', (Plus file_identifier_character), file_suffix, ']']
let lex_file_wo_brackets = [%sedlex.regexp? (Star path_identifier_character), (Plus file_identifier_character), file_suffix]
let lex_sheet_in_quote = [%sedlex.regexp? Plus sheet_identifier_character_in_quote]
let lex_file_sheet_in_quote = [%sedlex.regexp? lex_file, lex_sheet_in_quote]
let lex_before = [%sedlex.regexp?
("'", lex_file_sheet_in_quote, "'!") |
("'", lex_sheet_in_quote, "'!") |
(lex_sheet_out_quote, '!') |
(lex_file, "!") |
(lex_file_wo_brackets, "!") |
("'", lex_file, "'!") |
("'", lex_file_wo_brackets, "'!")]
Without the last 4 cases of lex_before
(i.e., (lex_file, "!") | (lex_file_wo_brackets, "!") | ("'", lex_file, "'!") | ("'", lex_file_wo_brackets, "'!")
), the total time of compilation (by ocamlc
) of the project was 3 minutes 30 seconds (what took time was the compilation of lexer.ml
). After adding those 4 cases, the total time of compilation is 13 minutes 40 seconds. What takes time is always the compilation of lexer.ml
.
Does anyone know how we could identify what slows down the compilation?
Is there anything wrong in the way I write named regular expressions that slows down the compilation?