2

I need to generate etags for image files on the web. One of the possible solutions I thought of would be to calculate CRCs for the image files, and then use those as the etag.

This would require CRCs to be calculated every time someone requests an image on the server, so its very important that it can be done fast.

So, how fast are algorithms to generate CRCs? Or is this a stupid idea?

Oliver
  • 11,297
  • 18
  • 71
  • 121
  • 1
    I think that if your images are not changing every time you do not need to calculate the CRCs at every request of such static content. – Davide Piras Oct 26 '11 at 11:19
  • As @Davide suggested, why not just generate the CRC (or better, and SHA1/MD5 hash) and keep it somewhere (file, database, etc) so that way you don't need to calculate it every time the image is requested. – Karl Nicoll Oct 26 '11 at 11:21
  • Very similar to http://stackoverflow.com/questions/2285482/getting-etags-right – Tim Rogers Oct 26 '11 at 11:22
  • I'm dynamically generating the images (to different widths, borders, etc) depending on the user query, so it's a little hard to do this. – Oliver Oct 26 '11 at 11:23

4 Answers4

5

Use instead a more robust hashing algo such as SHA1.

Speed depends on the size of the image. Most time will be spent on loading data from the disk, rather than in CPU processing. You can cache your generated hashes.

But I also advise on creating etag based on last update date of the file which is much quicker and does not require loading the whole file.

Remember, etag must only be unique for a particular resource so if two different images have the same last update time, it is fine.

Aliostad
  • 80,612
  • 21
  • 160
  • 208
  • +1 for mention that sha1 is more robust and against the uncommented downvote – sra Oct 26 '11 at 11:29
  • 3
    I wasn't the downvoter, but I'd guess that it was because if someone is worried about the speed of CRC-calculation, then suggesting SHA-1 is counter-productive, as it'll be slower than all but the crumbiest CRC implementaiton. – Jon Hanna Oct 26 '11 at 11:33
  • The same image could have the same etag for other versions based on if based on timestamp, if they can change within the resolution used. Hence if e.g. the fastest they could possibly change was within a fifth of a second, you should base the timestamp at least to the nearest 10th, so they couldn't possibly coincide. – Jon Hanna Oct 26 '11 at 12:44
  • Yes, it is true. I *think* in fact IIS generates etag purely based on filename so it never changes but it sends last update time. – Aliostad Oct 26 '11 at 12:50
  • Nope, IIS by defaults generates e-tags based on a machine-key and a change-count for the file. This is a bugger when you're using a web-farm as the machine-keys will differ, but that can be changed so that it's consistent between machines. – Jon Hanna Oct 26 '11 at 14:44
2

Most implementations use the last modified date or other file headers as the ETag including Microsoft's own, and I suggest you use that method.

Tim Rogers
  • 21,297
  • 6
  • 52
  • 68
1

I would suggest calculate hash when adding a image into a data base once and then just return it by SELECT along with a image itself.

If you are usign Sql Server and images not very large (max 8000 bytes) you can leverage HASHBYTES() function which able to generate SHA-1, MD5, ...

sll
  • 61,540
  • 22
  • 104
  • 156
1

Depends on the method used, and the length. Generally pretty fast, but why not cache them?

If there won't be changes to the files more often than the resolution of the system used to store it (that is, of file modification times for the filesystem or of SQLServer datetime if stored in a database), then why not just use the date of modification to the relevant resolution?

I know RFC 2616 advises against the use of timestamps, but this is only because HTTP timestamps are 1sec resolution and there can be changes more frequent than that. However:

  1. That's still fine if you don't change images more than once a second.
  2. It's also fine to base your e-tag on the time as long as the precision is great enough that it won't end up with the same for two versions of the same resource.

With this approach you are guaranteed a unique e-tag (collisions are unlikely with a large CRC but certainly possible), which is what you want.

Of course, if you don't ever change the image at a given URI, it's even easier as you can just use a fixed string (I prefer string "immutable").

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251