3

I am dabbling with the Object Pascal Engine (by Rob van den Brink) and it seems (except for a few minor and easily correctable errors) it works for Delphi unit files.

However, it has problems parsing Project (.dpr) and Package (.dpk) files; and the issue basically boils down to the differences between the stuff that 'uses' can have in units and projects (as well as what 'contains' clause can have in packages).

Let me give simple examples:

In a unit (.pas) file, the 'uses' clause can be something like this

uses
  Windows,
  Messages,
  SysUtils,
  Variants,
  Classes,
  Graphics,
  Controls,
  Forms,
  Dialogs,
  StdCtrls,
  ExtCtrls,
  ComCtrls;

whereas, in a Project (.dpr) file

uses
  Forms,
  UnitDemoMain in 'UnitDemoMain.pas' {Form1},
  SomeUnit in '..\SomeUnit.pas',
  SomeOtherUnit;

Yet, the same functionality (in the name of 'contains') occurs as:

contains
  OneUnit in 'OneUnit.pas',
  AnotherUnit in '..\AnotherUnit.pas';

The problem with the grammar file I have (from the above link) is that it only handles the most simple case (i.e. the way 'uses' occurs in unit files), and throws error for others.

I am guessing it boils down how 'IdList' is defined in the grammar file, which is this:

<IdList> ::= <IdList> ',' <RefId>
| <RefId>

My question, then, is: How do I alter this definition, so that it can handle other alternatives (as seen in Project and Pacckage files), i.e.:

UnitDemoMain in 'UnitDemoMain.pas' {Form1},
OneUnit in 'OneUnit.pas';
Adem
  • 265
  • 3
  • 10

1 Answers1

2

I haven't used the Gold package myself yet, but I have used Yacc quite a bit; that has a slightly different grammar layout but the principle is the same.

For starters I would try modifying the Delphi grammar as follows:

Change

<UsesClause>        ::= USES <IdList> ';'
              | SynError

to

<UsesClause>        ::= USES <UnitList> ';'
              | SynError

and add

<UnitList>      ::= <UnitList> ',' <UnitRef>
              | <UnitRef>

<UnitRef>       ::= <RefID>
              | <RefID> IN <StringLiteral>
!                 | <RefID> in <StringLiteral> Comment Start <RefID> Comment End

The line which I've commented out using the exclamation mark was initially intended to handle this construct in your example:

  UnitDemoMain in 'UnitDemoMain.pas' {Form1},

However, it seems that Gold's Builder treats the open- and close-curly-brace characters, { }, as a special case which seems to prevent them being used as anything other than to surround comments; I've been unable to find a way of using them as part of a grammar rule. The result of this change should hopefully be that '{Form1}' is simply ignored as a comment, and the example construct matches the previous variant ("<RefID> IN StringLiteral") instead.

Fwiw, Gold looks quite a nice package, except for a few problems including

  • the restriction mentioned in the ReadMe that it can only handle characters 0..127 and

  • its Parser Builder (v.5.2) complains when running using the D7 sample grammar that comes with it (before my suggested changes) about an invalid start symbol and a lexical error on line/state 82. Maybe I've missed something ...

MartynA
  • 30,454
  • 4
  • 32
  • 73
  • First, thank you for answering. Here are the results of changes: Gold rejected " in " so I altered it to be " in StringLiteral". With that, Gold is happy. But, we still have problems: When actually parsing a *.dpk file, the 'in' bit throws error. Does that mean we have to define a separate rule for 'in'; if so, how? – Adem Mar 09 '16 at 13:03
  • I am using XE2, I didn't receive any "invalid start symbol and a lexical error" with or without the changes. – Adem Mar 09 '16 at 13:05
  • Odd. What version of the Builder and D7 grammar are you using? My builder is 5.2 as I said and the grammar file, D7Grammar.Grm is 31163 bytes, dated 28 August 2006 and is in the file D7_v11.Zip. Until I get the builder working w/o complaint I can't see that I can help much more. – MartynA Mar 09 '16 at 13:14
  • Btw, try "IN" instead of "in" in the definitions of UnitRef - that's how it is written in the definition of - " ::= '=' | '>' | '<' | '<=' | '>=' | '<>' | IN | IS | AS " and as the builder accepts "IN" there, it ought to accept & handle it in UnitRef. – MartynA Mar 09 '16 at 13:30
  • I've got my copy of Builder to digest the supplied version of the D7 grammar by the not very elegant method of truncating the definition of FloatLiteral on line 82, before the period, which is where it was complaining before. As a result, I've managed to get it to accept a minor variant of the changes in the original version of my answer, which I've updated. – MartynA Mar 09 '16 at 14:28
  • Oh yes, I totally forgot about that. I had changed that to be "FloatLiteral = {Digit}+'.'+{Digit}+" and that bit had worked. – Adem Mar 09 '16 at 15:04
  • Chnaging the 'in' to uppercase 'IN' didn't help. I guess the 'in' in this case is very much different from the 'in' that is used in 'foreach'. If a new rule needs to be defined for this new 'in', it definitely is beyond me. – Adem Mar 09 '16 at 15:06
  • About the versions: the header in the grammar file says it is v1.1; Gold is v5.2.0. I am not sure Gold still has that 0..127 char limitation. I couldn't find any reference to it in whatsnew stuff. Is there a simple way to check whether that limitation has been removed. – Adem Mar 09 '16 at 15:10
  • BAck to the 'IN' problem again: When I run Gold Parser GUI and load the 'D7Grammar.grm' file with the changes you suggested, it throws an error when doing 'Project | Create LALR Parse tables'. The error is about 'RefID' not being acceptable here --apparently we DO have to have different rule for this new 'in'. – Adem Mar 09 '16 at 15:39
  • " Is there a simple way to check whether that limitation" Sorry, no idea except to look at the source of whatever it is that imposes the limitation or maybe feed it some unicode and see if it croaks. – MartynA Mar 09 '16 at 15:59
  • I have just realized that the errors I was getting wasn't due to your suggestions. Gold v5.2 seems to find a shift-reduce error even though earlier versions allowed it. So, I will accept your result and open another question for the error. – Adem Mar 09 '16 at 17:21
  • If you accept my ans that would be great. Now you've piqued my curiosity, I'll take a look at your new q when you post it. Meanwhile, are you aware of the Castalia parser (https://github.com/jacobthurman/Castalia-Delphi-Parser) - it's the one currently used in Castalia (therefore quite up-to-date), freeware for non-comm use and based on Martin Waldenburg(?)'s venerable parser, so it has a good pedigree. – MartynA Mar 09 '16 at 17:28
  • How to accept? Hover mouse near the upvotes on the left and a green tick should appear. – MartynA Mar 09 '16 at 17:29
  • I know about the parser Castalia uses (though I very much doubt the published sources are what they are using in their production code). While it is fast, it is quite hard to grasp and update for new Delphi additions. That is why I am trying other alternatives. So far, PasParse [ https://github.com/Turbo87/PasParse ] is quite promising; I am looking at Gold as an alternative. – Adem Mar 09 '16 at 17:42