Summary and basic question
Using MS Access 2010 and VBA (sigh..)
I am attempting to implement a specialized Diff function that is capable of outputting a list of changes in different ways depending on what has changed. I need to be able to generate a concise list of changes to submit for our records.
I would like to use something such as html tags like <span class="references">These are references 1, 6</span>
so that I can review the changes with code and customize how the change text is outputted. Or anything else to accomplish my task.
I see this as a way to provide an extensible way to customize the output, and possibly move things into a more robust platform and actually use html/css.
Does anyone know of a similar project that may be able to point me in the right direction?
My task
I have an access database with tables of work operation instructions - typically 200-300 operations, many of which are changing from one revision to another. I have currently implemented a function that iterates through tables, finds instructions that have changed and compares them.
Note that each operation instruction is typically a couple sentences with a couple lines at the end with some document references.
My algorithm is based on "An O(ND) Difference Algorithm and Its Variations" and it works great.
Access supports "Rich" text, which is just glorified simple html, so I can easily generate the full text with formatted additions and deletions, i.e. adding tags like <font color = "red"><strong><i>This text has been removed</i></strong></font>
. The main output from the Diff procedure is a full text of the operation that includes non-changed, deleted, and inserted text inline with each other. The diff procedure adds <del>
and <ins>
tags that are later replaced with the formatting text later (The result is something similar to the view of changes from stack exchange edits).
However, like I said, I need the changes listed in human readable format. This has proven difficult because of the ambiguity many changes create.
for example: If a type of chemical is being changed from "Class A" to "Class C", the change text that is easily generated is "Change 'A' to 'C'", which is not very useful to someone reviewing the changes. More common are document reference at the end: Adding SOP 3 to the list such as "SOP 1, 2, 3" generates the text "Add '3'". Clearly not useful either.
What would be most useful is a custom output for text designated as "SOP" text so that the output would be "Add reference to SOP 3".
I started with the following solution:
Group words together, e.g. treat text such as "SOP 1, 2, 3" as one token to compare. This generates the text "Change 'SOP 1, 2' to 'SOP 1, 2, 3". This get's cluttered when there is a large list and you are attempting to determine what actually changed.
Where I am now
I am now attempting to add extra html tags before running the the diff algorithm. For example, I will run the text through a "pre-processor" that will convert "SOP 1, 2" to SOP 1, 2
Once the Diff procedure returns the full change text, I scan through it noting the current "class" of text and when there is a <del>
or <ins>
I capture the text between the tags and use a SELECT CASE
block over the class to address each change.
This actually works okay for the most part, but there are many issues that I have to work through, such add Diff deciding that the shortest path is to delete certain opening tags and insert other ones. This creates a scenario that there are two <span>
tags but only one </span>
tag.
The ultimate question
I am looking for advise to either continue with the direction I have started or to try something different before investing a lot more time into a sub-optimal solution.
Thanks all in advance.
Also note:
The time for a typical run is approximately 1.5 to 2.5 seconds with me attempting more fancy things and a bunch of debug.prints. So running through an extra pass or two wouldn't be killer.