2

I am trying to delete all comments from a VHDL file with sed and a regular expression.

VHDL comments start with -- , the rest of the line after this is a comment.

My first approach was: sed -i 's/--.*//g' file.vhdl

This deletes all comments, but the file could also contain assignments with don't cares: symbol - . Therefore assignments like sig1 <= "11--000" also are affected. Additionally assignments can be concatenations like sig1 <= "0--" & "--1". Is there a good regex to cover all these cases? Maybe matching from the end of a line, as an assignment line has to be ended with a ; ?

A test file which covers all the cases:

-- comment start of line
architecture beh of ent_name is
    signal sig1 : std_logic_vector(6 downto 0); -- comment end of line
begin
proc: process (sensitivity)
begin
    sig1 <= "0--11-1"; -- another comment
    sig1 <= "0--11--";
    sig1 <= "00--" & "--1"; -- yet another
    sig1 <= "00--" & "--1";
end process proc;
end beh;

Thanks!

MartinM
  • 129
  • 1
  • 8
  • 1
    Out of interest, what is your reason for deleting comments? – scary_jeff Oct 18 '17 at 11:03
  • The files are user submitted files, which are automatically checked for certain keywords. E.g. the students have to use predefined entities, so I check for the occurrence of the entityname. I don't want them to trick the system by writing the name as a comment. Or for example if i prohibit the wait statement and someone writes a comment with wait in it it would be rejected. – MartinM Oct 18 '17 at 11:07
  • Oh, nice idea. In case it matters, your test code does not cover the case where there is a double quote inside a comment. – scary_jeff Oct 18 '17 at 11:26
  • 1
    What about VHDL-2008 block comments? :-) – Matthew Taylor Oct 18 '17 at 11:40
  • 1
    char <= '"'; -- Assign a " to char – lasplund Oct 18 '17 at 16:26
  • Are any of the language preprocessors - like cpp/M4, ... configurable WRT comment indicators? They may be able to do that for you. – Jim Lewis Aug 05 '21 at 14:11
  • On a side note, with the VHDL-2008 changes, ?=, ..., I am thinking that we should not use "-" as don't care in an assignment, but instead use "X". In synthesis, both are interpreted as don't care in the assignment. This means that any value can be applied to that output. However, in simulation, with VHDL-2008 ?=, "-" on an input in a comparison is treated as don't care or match any character. It would be bad to assign something a value of '-' and then later have that '-' be used on the input to a ?=. It is questionable that the implemented hardware would match the RTL simulation. – Jim Lewis Aug 05 '21 at 14:16

3 Answers3

3

Using a parser would be a better solution.

Let's assume you can't, add what you don't want in your pattern, i.e. in here no quotation mark up to end of line:

--[^"]*?$

This certainly doesn't cover all cases, but in your example it should work.
Demo here.

PJProudhon
  • 835
  • 15
  • 17
  • The possibility of a `"` in comment is exactly why a code parser would definitely be a better solution. Even with .NET balancing groups or with PCRE recursive constructs we can't assure to accurately parse code, regex are not meant to such tasks (you got my upvote for the point). – PJProudhon Oct 19 '17 at 04:11
1

Quoting IEEE 1076-2008:

15.9 Comments

A comment is either a single-line comment or a delimited comment. A single-line comment starts with two adjacent hyphens and extends up to the end of the line. A delimited comment starts with a solidus (slash)character immediately followed by an asterisk character and extends up to the first subsequent occurrence of an asterisk character immediately followed by a solidus character.

An occurrence of two adjacent hyphens within a delimited comment is not interpreted as the start of a singleline comment. Similarly, an occurrence of a solidus character immediately followed by an asterisk character within a single-line comment is not interpreted as the start of a delimited comment. Moreover, an occurrence of a solidus character immediately followed by an asterisk character within a delimited comment is not interpreted as the start of a nested delimited comment.

A single-line comment can appear on any line of a VHDL description and may contain any character except the format effectors vertical tab, carriage return, line feed, and form feed. A delimited comment can start on any line of a VHDL description and may finish on the same line or any subsequent line. The presence or absence of comments has no influence on whether a description is legal or illegal. Furthermore, comments do not influence the execution of a simulation module; their sole purpose is to enlighten the human reader.

Examples:

-- The last sentence above echoes the Algol 68 report.
end; -- Processing of LINE is complete.
----------- The first two hyphens start the comment.
/* A long comment may be written
    on several consecutive lines */
x := 1; /* Comments /* do not nest */

NOTE 1—Horizontal tabulation can be used in comments, after the starting characters, and is equivalent to one or more spaces (SPACE characters) (see 15.3).

NOTE 2—Comments may contain characters that, according to 15.2, are non-printing characters. Implementations may interpret the characters of a comment as members of ISO/IEC 8859-1:1998, or of any other character set; for example, an implementation may interpret multiple consecutive characters within a comment as single characters of a multi-byte character set.

Seeing this, it seems impossible to achieve your goal using only a regular expression parser, as you need to parse the string preceding the comment. You will likely need a VHDL parser to evaluate the language specifics. You could look into the prettyprint code that StockOverflow uses. It seems to detect comments quite well.

JHBonarius
  • 10,824
  • 3
  • 22
  • 41
  • Comments are lexical elements typically discarded as not affecting the meaning of a VHDL specification. Historically there are pragmas implemented as comments, intended to be supplanted by -2008 tool directives. Lexical analyzers are a *complete* ordered set of regular expression analyzers capable of detecting all valid lexical elements. Pretty printers or syntax highlighters typically don't provide a complete set without which you may depend on style conventions. –  Oct 18 '17 at 21:11
  • [What is syntax highlighting and how does it work?](https://meta.stackexchange.com/questions/184108/what-is-syntax-highlighting-and-how-does-it-work) for all Stack Exchange Q&A sites us to [lang-vhdl.js](https://github.com/google/code-prettify/blob/master/src/lang-vhdl.js) implementing an incomplete lexical analyzer. Note strings are evaluated before comments. The RE's evaluation order is defined by the standard. –  Oct 18 '17 at 22:47
  • If you [look closely](https://i.stack.imgur.com/ofDY5.jpg) the Prettify syntax highlighter used here is susceptible to highlighting errors because it's not complete. See the answer talking about Issue Report IR1045 [here](https://stackoverflow.com/questions/43159960/lexing-the-vhdl-tick-token/43160723#43160723). It's an example of why you should really have a complete lexical analyzer. –  Oct 18 '17 at 22:58
0

Perl has a nice expression for removing C // and /.../ comments while paying attention to quoted strings. I'll see if I can modify it for "--" instead of //. I need this for Ada which has similar comment syntax (VHDL borrowed syntax from Ada and C) an will post when I've worked it out.