1

I'm trying to write some tests for my code to ensure that an HTML file generated by Folium is consistent.

My initial thought was to use md5 checksums and the test I've written generates a Folium map, saves it to HTML and then generates the checksum using the following code (based on the excellent answers here)...

with open(plot_journey.journey.journey_id + '.html', "rb") as f:
            html_map = f.read()
assert hashlib.md5(html_map).hexdigest() == '12a7073a77278705ca1bfa5446b2a78c'

...but this fails because the md5-checksum calculation includes the date/time of the files creation and that in turn changes each time the test is run.

I don't know of anyway around this, had a quick try of sha256sum and it too uses date/time of file creation.

Is this even a sensible approach to be taking? I'm thinking not, but have no idea how to write a test to ensure that a given set of data is consistently plotted by Folium and saved to HTML.

slackline
  • 2,295
  • 4
  • 28
  • 43
  • Why not remove dates from both files and hard-code the hash sum of the remaining part. – Arnie97 Mar 25 '19 at 13:49
  • I had no idea that you could remove the file creation date from a specific file on a filesystem, I thought creation dates were an inherent part of any filesystem (the date is not in the filename at all btw). Do you know off the top of your head the command for doing this under GNU/Linux? – slackline Mar 25 '19 at 14:20
  • 1
    The file creation time on the file system doesn't matter for your code above. There should be something slightly different in the file content itself, so I thought you were talking about some date strings in the file. Try `diff` or `git diff` since you are using Linux. – Arnie97 Mar 25 '19 at 14:25
  • Ah, ok, perhaps there is something in the underlying Folium map that differs, I can print out the `md5` from the python test and then run `md5sum` on the file at the command line and they match. Take that md5 and put it back into the test and it fails when re-run, which made me think the file creation time on the system was causing the failure as that was the only thing I could think of that was varying. – slackline Mar 25 '19 at 14:41

0 Answers0