Given a list like this:
direct_SQL_statement ::=
directly_executable_statement semicolon
directly_executable_statement ::=
direct_SQL_data_statement
| SQL_schema_statement
| SQL_transaction_statement
| SQL_connection_statement
| SQL_session_statement
| direct_implementation_defined_statement
direct_SQL_data_statement ::=
delete_statement__searched
| direct_select_statement__multiple_rows
| insert_statement
| update_statement__searched
| truncate_table_statement
| merge_statement
| temporary_table_declaration
direct_implementation_defined_statement ::=
"!! See the Syntax Rules."
apostrophe ::=
"'"
/*
5.2 token and separator
Function
Specify lexical units (tokens and separators) that participate in SQL language.
Format
*/
token ::=
nondelimiter_token
| delimiter_token
identifier_part ::=
identifier_start
| identifier_extend
/*
identifier_start ::=
"!! See the Syntax Rules."
identifier_extend ::=
"!! See the Syntax Rules."
*/
large_object_length_token ::=
digit+ multiplier
Is it possible to use Perl's look-ahead assertion to break it up into individual definition list?
I tried,
perl -0777ne 'print "$&\n^^\n\n" while /(?=\w+\s*::=)\w+\s*::=\s*.+/gs;'
but it just returned the whole thing (as if the look-ahead assertion is not working at all), while
perl -0777ne 'print "$&\n^^\n\n" while /(?=\w+\s*::=)\w+\s*::=\s*.+?/gs;'
comes up just too short:
direct_SQL_statement ::=
d
^^
directly_executable_statement ::=
d
^^
direct_SQL_data_statement ::=
d
^^
direct_implementation_defined_statement ::=
"
^^
I need to break it up into individual BNF definition chunks to further process, like this for the initial test data:
direct_SQL_statement ::=
directly_executable_statement semicolon
^^
directly_executable_statement ::=
direct_SQL_data_statement
| SQL_schema_statement
| SQL_transaction_statement
| SQL_connection_statement
| SQL_session_statement
| direct_implementation_defined_statement
^^
direct_SQL_data_statement ::=
delete_statement__searched
| direct_select_statement__multiple_rows
| insert_statement
| update_statement__searched
| truncate_table_statement
| merge_statement
| temporary_table_declaration
^^
direct_implementation_defined_statement ::=
"!! See the Syntax Rules."
^^
Notes,
- the above output is from the initial test data.
- The whole
A ::= B
thing is called a BNF definition. the "^^
" is only for visual indication that the separation is done properly. - the
apostrophe
and the followingtoken
are different BNF definitions and should be treated as such. The/* ... */
comment should be filtered out from the output. - comments may come without empty lines surrounding them. That's the reason I need to rely on the look-ahead assertion instead of the paragraphs mode.
- The question comes as a follow up to How can EBNF or BNF be parsed?, of which the solution is "W3C EBNF doesn't end a production with a semicolon because a ::= operator comes after the LHS symbol of a new production."
- The whole file can be found at github.com/ronsavage/SQL/blob/master/sql-2016.ebnf