2

I'm working on improving the existing grammar for Stata for use on Atom, the language-stata package. Stata code follows a pattern: the first word in a line is a command and a comma separates options from the objects of the command. For example, to run a linear regression of y on x without a constant, you run:

regress y x, noconstant

A triple slash means that the command continues in the following line. Thus the previous code is equivalent to:

regress x /// COMMENTS
y, /// MORE COMMENTS
noconstant

I think that the grammar should highlight every first word of a line, unless the previous line contains a triple slash. In the two examples above, it should highlight the command regress, but it should not highlight the words y or noconstant in the second example. I imagine something like:

  1. Start capturing at the beginning of a line;
  2. Highlight the first word;
  3. Continue capturing as long as lines contain a triple slash;
  4. Stop when I find the end of a line without a triple slash.

I've tried a few things. For instance:

{
    name: 'comment.line.stata'
    match: '///.*'
}
{
    begin: '^\\s*(\\w+)'
    end: '(?<!///)$'
    beginCaptures:
        "1":
            name: 'support.function.stata'
}

This code highlights the first word of every line, whether or not a triple slash preceded it. On the other hand,

{
    name: 'comment.line.stata'
    match: '///.*'
}
{
    begin: '^\\s*(\\w+)'
    while: '///'
    beginCaptures:
        "1":
            name: 'support.function.stata'
}

highlights the first word of the document and nothing else.

Does anyone have an idea to solve this? Thanks!

Luca B.
  • 121
  • 3
  • I tried to be more specific! How does it sound? – Luca B. Apr 08 '16 at 22:56
  • But it does not recognize any pattern if I use `regress y x, noconstant`. – Luca B. Apr 09 '16 at 03:21
  • I tried to clarify and simply the problem a bit. I removed the part about the comma. Does it seem clearer? – Luca B. Apr 09 '16 at 13:29
  • What do you want to highlight in the first example: `regress y x, noconstant`? Please try [this pattern](https://regex101.com/r/vL4oW6/5), but it is not so smart:( – Quinn Apr 09 '16 at 18:43
  • Your code inspired me to find a solution! I posted it below. I just had to generalize your code a bit. – Luca B. Apr 09 '16 at 20:22
  • I'm not familiar with Stata. Nice to see you got an answer finally. :) – Quinn Apr 09 '16 at 20:31

1 Answers1

0

Building on comments by ccf and this discussion, I came up with something that seems to work:

{
    name: 'comment.line.stata'
    match: '///.*'
}
{
    begin: '^\\s*([a-zA-Z_]+)(.*)(///.*)$'
    beginCaptures:
        "1":
            name: 'support.function.stata'
    end: '^([^/]+|/($|[^/]|/($|[^/])))*$'
}
{
    captures:
        "1":
            name: 'support.function.stata'
    match: '^\\s*([a-zA-Z_]+)'
}
Community
  • 1
  • 1
Luca B.
  • 121
  • 3