I have the HTML files of the different versions of the same website and I need to find a way to measure and quantify the change between different versions.
What is a good way to measure a change in HTML files? Is there an established way to do this?
What is a good way to do this at scale using R or Python?
I have tried counting the number of lines and the number of tags in each HTML file. Although I expect this to give me a basic idea about the magnitude of change, I wonder if there is a better way to do this.