I have a file of macros exported from SAS
that I want to parse in order to build documentation with R and markdown
(I can't use existing external software due to security limitations at work).
Specifically I want to extract:
- the name of the macro
- parameters and their description
- contents of two sections named USES and EXAMPLES
- the body of the macro function
Unfortunately my lack of regex skills is hurting me again though I don't think the rules are that complicated.
See my example below and expected output:
my_text <- "
%macro macro_name_1
/*----------------------------------------------------------------------------------
optional macro description on one or several lines.
this section always starts with slash star dashes and ends with dashes star slash
and it never contains these combinations of characters in the text
----------------------------------------------------------------------------------*/
(param1 /* optional description of param1 */
,param2
,param3 /* optional description of param3 */
);
/* USES:
some info on one or several lines,
always starts with 'slash star USES:'
and ends with 'star slash'
but doesn't contain these combinations of characters
*/
/* EXAMPLES:
some examples on one or several lines,
always starts with 'slash star EXAMPLES:' OR 'slash star EXAMPLE:'
and ends with 'star slash'
but doesn't contain these combinations of characters
*/
some code on one or several lines,
always after USES and EXAMPLE(S) sections
that may or not contain combinations of /* and */
%mend;
some text outside of a macro-mend pattern, which I wish to ignore
%macro macro_name_2
/*---------------------
desc of macro_name_2
---------------------*/
(x
,y /* desc of y*/
);
/* USES: something */
/* EXAMPLE:
example for macro_name2
*/
code2
%mend;
some more irrelevant text
%macro macro_name_3;
code3
%mend;
"
The output doesn't have to be identical to what I propose here but should have at least a similar structure (text is abbreviated for readability) :
expected_output <- tibble::tribble(
~'macro_name', ~'description', ~'parameters', ~'uses', ~'examples', ~'code',
"macro_name_1", "optional macro...", list(param1="optional desc...",
param2="",
param3="optional desc..."), "Some info...", "some examples...", "some code...",
"macro_name_2", "desc of macro_name_2", list(x="", y="desc of y"), "something", "example for...", "code2",
"macro_name_3", "", list(), "", "", "code3")
# # A tibble: 3 x 6
# macro_name description parameters uses examples code
# <chr> <chr> <list> <chr> <chr> <chr>
# 1 macro_name_1 optional macro... <list [3]> Some info... some examples... some code...
# 2 macro_name_2 desc of macro_name_2 <list [2]> something example for... code2
# 3 macro_name_3 <list [0]> code3