I'm working on improving the existing grammar for Stata for use on Atom, the language-stata package. Stata code follows a pattern: the first word in a line is a command and a comma separates options from the objects of the command. For example, to run a linear regression of y on x without a constant, you run:
regress y x, noconstant
A triple slash means that the command continues in the following line. Thus the previous code is equivalent to:
regress x /// COMMENTS
y, /// MORE COMMENTS
noconstant
I think that the grammar should highlight every first word of a line, unless the previous line contains a triple slash. In the two examples above, it should highlight the command regress
, but it should not highlight the words y
or noconstant
in the second example. I imagine something like:
- Start capturing at the beginning of a line;
- Highlight the first word;
- Continue capturing as long as lines contain a triple slash;
- Stop when I find the end of a line without a triple slash.
I've tried a few things. For instance:
{
name: 'comment.line.stata'
match: '///.*'
}
{
begin: '^\\s*(\\w+)'
end: '(?<!///)$'
beginCaptures:
"1":
name: 'support.function.stata'
}
This code highlights the first word of every line, whether or not a triple slash preceded it. On the other hand,
{
name: 'comment.line.stata'
match: '///.*'
}
{
begin: '^\\s*(\\w+)'
while: '///'
beginCaptures:
"1":
name: 'support.function.stata'
}
highlights the first word of the document and nothing else.
Does anyone have an idea to solve this? Thanks!