4

In VHDL it the ' character can be used to encapsulate a character token ie '.' or it can as an attribute separator (similarish to CPP's :: token) ie string'("hello").

The issue comes up when parsing an attribute name containing a character ie string'('a','b','c'). In this case a naive lexer will incorrectly tokenize the first '(' as a character, and all of the following actual character will be messed up.

There is a thread in comp.lang.vhdl google group from 2007 which asks a similar question Titled "Lexing the ' char" that has an answer by user diogratia

        case '\'':                          /* IR1045 check */

            if (    last_token == DELIM_RIGHT_PAREN ||
                    last_token == DELIM_RIGHT_BRACKET ||
                    last_token == KEYWD_ALL ||
                    last_token == IDENTIFIER_TOKEN ||
                    last_token == STR_LIT_TOKEN ||
                    last_token == CHAR_LIT_TOKEN || ! (buff_ptr<BUFSIZ-2) )
                token_flag = DELIM_APOSTROPHE;
            else if (is_graphic_char(NEXT_CHAR) &&
                    line_buff[buff_ptr+2] == '\'') { CHARACTER_LITERAL:
                buff_ptr+= 3;               /* lead,trailing \' and char */
                last_token = CHAR_LIT_TOKEN;
                token_strlen = 3;
                return (last_token);
            }
            else token_flag = DELIM_APOSTROPHE;
            break;

See Issue Report IR1045: http://www.eda-twiki.org/isac/IRs-VHDL-93/IR1045.txt

As you can see from the above code fragment, the last token can be captured and used to di"sambiguate something like:

  foo <= std_logic_vector'('a','b','c');

without a large look ahead or backtracking.

However, As far as I know, flex doesn't track the last token that was parsed.

Without having to manually keep track of the last parsed token, is there a better way to accomplish this lexing task?

I am using IntelliJ GrammarKit if that helps.

DanChianucci
  • 1,175
  • 2
  • 11
  • 21

1 Answers1

5

The idea behind IR1045 is to be able to tell whether a single quote/apostrophe is part of a character literal or not without looking ahead or backtracking when you're wrong, try:

library ieee;
use ieee.std_logic_1164.all;

entity foo is
    port (
        a:      in      std_logic;
        b:      out     std_logic_vector (3 downto 0)
    );
end entity;

architecture behave of foo is
    begin
    b <= std_logic_vector'('0','1','1','0')     when a = '1' else
         (others =>'0')                         when a = '0' else
         (others => 'X');
end architecture behave;

How far ahead are you willing to look?

There is however a practical example of flex disambiguation of apostrophes and character literals for VHDL.

Nick Gasson's nvc uses flex, in which he implemented an Issue Report 1045 solution.

See the nvc/src/lexer.l which is licensed under GPLv3.

Search for last_token:

#define TOKEN(t) return (last_token = (t))

and

#define TOKEN_LRM(t, lrm)                                       \
   if (standard() < lrm) {                                      \
      warn_at(&yylloc, "%s is a reserved word in VHDL-%s",      \
              yytext, standard_text(lrm));                      \
      return parse_id(yytext);                                  \
   }                                                            \
   else                                                         \
      return (last_token = (t));

An added function to check it:

static int resolve_ir1045(void);

static int last_token = -1;

which is:

%%

static int resolve_ir1045(void)
{
   // See here for discussion:
   //   http://www.eda-stds.org/isac/IRs-VHDL-93/IR1045.txt
   // The set of tokens that may precede a character literal is
   // disjoint from that which may precede a single tick token.

   switch (last_token) {
   case tRSQUARE:
   case tRPAREN:
   case tALL:
   case tID:
      // Cannot be a character literal
      return 0;
   default:
      return 1;
   }
}

The IR1045 location has changed since the comp.lang.vhdl post it's now

http://www.eda-twiki.org/isac/IRs-VHDL-93/IR1045.txt

You'll also want to search for resolve_ir1045 in lexer.l.

static int resolve_ir1045(void);

and

{CHAR}            { if (resolve_ir1045()) {
                       yylval.s = strdup(yytext);
                       TOKEN(tID);

Where we find nvc uses the function to filter detecting the first single quote of a character literal.

This was originally an Ada issue. IR-1045 was never adopted but universally used. There are probably Ada flex lexers that also demonstrate disambiguation.

The requirement to disambiguate is discussed in Ada User Journal volume 27 number 3 from September 2006 in an article Lexical Analysis on PDF pages 30 and 31 (Volume 27 pages 159 and 160) where we see the solution is not well known.

The comment that character literals do not precede a single quote is inaccurate:

entity ir1045 is
end entity;

architecture foo of ir1045 is
begin
THIS_PROCESS:
    process
        type twovalue is ('0', '1');  
        subtype string4 is string(1 to 4);
        attribute a: string4;
        attribute a of '1' : literal is "TRUE";
    begin
        assert THIS_PROCESS.'1''a /= "TRUE"
            report "'1''a /= ""TRUE"" is FALSE";
        report "This_PROCESS.'1''a'RIGHT = " &
            integer'image(This_PROCESS.'1''a'RIGHT);
        wait;
    end process;
end architecture;

The first use of an attribute with selected name prefix that has a suffix that is a character literal demonstrates the inaccuracy, the second report statement shows it can matter:

ghdl -a ir1045.vhdl
ghdl -e ir1045
ghdl -r ir1045
ir1045.vhdl:13:9:@0ms:(assertion error): '1''a /= "TRUE" is FALSE
ir1045.vhdl:15:9:@0ms:(report note): This_PROCESS.'1''a'RIGHT = 4

In addition to an attribute name prefix containing a selected name with a character literal suffix there's a requirement that an attribute specification 'decorate' a declared entity (of an entity_class, see IEEE Std 1076-2008 7.2 Attribute specification) in the same declarative region the entity is declared in.

This example is syntactically and semantically valid VHDL. You could note that nvc doesn't allow decorating a named entity with the entity class literal. That's not according to 7.2.

Enumeration literals are declared in type declarations, here type twovalue. An enumerated type that has at least one character literal as an enumeration literal is a character type (5.2.2.1).

  • Why did these troubles pop up in the 2000s while the language feature: literal vs. type qualification vs. attributes with one character name are much older? As far as I know, most VHDL parsers are hand writing and don't rely on parser/lexer generators. In most cases you end up writing a lot of special rules to circumvent VHDL's special cases. – Paebbels Apr 02 '17 at 00:22
  • You can count the number of commercial VHDL implementations that have been started from scratch since around 1991 on the fingers of one hand and still have fingers left over. It's generally the adventuresome who find things don't behave and no one left them a trail of breadcrumbs in the documentation. I used to think that was zero sum competition - someone has to lose market share for someone else to gain market share so you avoid giving help. Nowadays the state of the art is multi-language simulation and synthesis. A start up can't compete anyway, that field is patent rich. –  Apr 02 '17 at 01:38
  • @Paebbels Ada's the same way. When was the last time there was a new Ada compiler? –  Apr 02 '17 at 01:42
  • There is a Visual Studio plugin and a Eclipse plugin (Sigasi). Then there is this new VHDL frontend provider used by several other vendors. I'm working on a Python version of a VHDL parser ([pyVHDLParser](https://github.com/Paebbels/pyVHDLParser). I know that many big companies have their own internal VHDL tools, which are not public and not available to buy them. Oh and Xilinx bought a new frontend for Vivado developed by a third company. That explains the decrease in supported features compared to XST. – Paebbels Apr 02 '17 at 01:51
  • 2
    I'd imagine the commonality between different HDLs is responsible for feature decline since XST, with a dash of system level synthesis. Pork (Verilog ) and beef (VHDL) are being forced through the same meat grinder. It's a side effect of multi-language support with the same tools. –  Apr 02 '17 at 02:24