2

I've got a bunch of ACPI Source Language files and I want to calculate file to file similarities between them. I thought of using something like Perl's Parse::RecDescent but I am stuck at:

1) Translating the ACPI Grammar (www.acpi.info/DOWNLOADS/ACPIspec40a.pdf) to something Parse::RecDescent would understand 2) Have a metric to compare 2 parsed files

Any ideas?

719016
  • 9,922
  • 20
  • 85
  • 158
  • All you want is a "similarity" rather than actual deltas? Why won't just counting the number of lines that diff produces give a useful similarity number? Maybe you want to compare the files using their syntax trees? – Ira Baxter May 03 '11 at 14:10
  • Yes, I want to compare them using their contents according to the common syntax – 719016 May 06 '11 at 12:12
  • 1
    by the way if you are using perl 5.10 or newer, I will recommend to use [Regexp::Grammars](http://search.cpan.org/dist/Regexp-Grammars/lib/Regexp/Grammars.pm) instead Parse::RecDescent – Pablo Marin-Garcia May 22 '11 at 00:37

2 Answers2

2
  1. To get started with Parse::RecDescent you may look at Pro Perl Parsing, Ch. 5 or at Advanced Perl Programming, Ch. 2
  2. Xml Diff tools should be appropriate for comparing hierarchically structured data; perhaps you can apply such a tool to ASTs saved in XML format
Ekkehard.Horner
  • 38,498
  • 2
  • 45
  • 96
1

So you have two problems:

  • Parsing ACPI to build an AST. This has the usual troubles of ensuring that you have a well defined grammar, that your parsing machinery can parse according to that grammar (often you have to bend a good grammar definition to enable the parsing machiney to process it), and building a corresponding AST. You will have these troubles with Perl parsing machinery, simply because it is a parsing engine.

  • Comparing the structure of the ASTs and producing a sensible answer. What you are likely to find here is that there is some literature describing roughtly how to do this (using e.g. Levenshtein distance), but that the details for ASTs matter. (Change distilling: Tree differencing for fine-grained source code change extraction Finally, having determined the distance, you need to print out the deltas in some readable form.

However, AFAIK, my company is the only one that has reduced this to practice. See our Smart Differencer tool. THe SmartDifferencers parse, build ASTs, and report changers in terms of ASTs elements moved, inserted, deleted, replaced, or modifiied by consistent identifier substitition. They depend on any underlying very strong GLR parsing engine which minimized the problems of accepting new grammars. They work for many common languages but not presently for ACPI.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341