- Are those libraries yours, I mean, is it your code?
- Are they installed by an installer of yours, or independently, and you just inspect them?
If in any way you can supervise their initial installation on target machine, you can do some poor-man's watermarking with plain old DLL resources.
Attach a binary resource with your own contents to each version of the DLL installed and then inspect that laster. It is much a if you embeded a public static readonly class Something{ public static SomeData MyImportantInformation = ...; }
in each code and read it in runtime, or as if you use [Attributes] with the data over some classes and read them through reflection - but using binary resources has 2 tiny advantages:
- you can add/remove the resourecs from a DLL after it has been built (a bit like with ILMerge tool)
- you can read the resources from native code just as easily as from managed, and to read them you can load the DLL in very limited and resource-saving way
Mind that I mean 'the low-level resources', such as Manifest which usually sits resource on slot #0, or .exe/.dll icons.
On binary resources:
http://www.codeproject.com/Articles/4221/Adding-and-extracting-binary-resources
And on managed embedded resources, which are easier to use:
http://keithelder.net/2007/12/14/how-to-load-an-embedded-resource-from-a-dll/
https://stackoverflow.com/a/7978410/717732
You can add adding/modifying the resources to your build scripts, to be sure that each version published has different/correct information added. Of course, if you control the build process, then you may just as well fireup the aforementioned ILmerge to put aything into any DLL.. While most of that would work, but in general, I think this is an overkill and if done improperly it would break any security signatures, if it modifies the DLL after it is signed. It has to be done before it..
If you control the build process, you can just put the necessary versioning information in the code as class-static data, or simply as attributes at assembly level, or (...)
Or why don't you just use version numbers to differentiate the versions? :) ie. semantic versioning?
On the other hand, if you are working with not-yours DLLs and if you have no control on their deployment, then you are on the tough grounds. As others said, the compilers may apply many different tricks during the compilation, but - please note - they have both some legal, and logic restrictions on what they can do to the compiled code.
Example of "logic" constraints:
- they may change the instructions, but may not change the overall meaning and (side)effects
- they may change both the code and data layout/structure, but not in a way that would change the algorithms to handle them
etc
Example of "legal" constraints:
- they are not allowed to remove any public symbol (public = visible by other code modules, that is, in .Net that covers: public and protected, and sometimes even internal and private)
- they are not allowed to change the name of any public symbol
- they are not allowed to change the signature of any public symbol
etc
Now if you limit yourself only to such information, you may gather/calculate hashes/signatures of any code in a way that has a chance to be compiler- and platform-independent. You will not get a definitieve answer that thay are the same or not, but you will get a view on how probable is that they are.
For the most trivial example: load the DLL via reflection and scan all classes for their public and non-public member names. Then, either calculate a hash over that string set, or just use the whole stringset, I'd be probably counted in kbytes at most. If a large change is made to the code, it is almost sure that some fields/methods will be added or removed. For smaller changes, you may also scan signatures of the methods: add parameter lists and types of the parameters and return values to the pool. A bit more of work and more probability of detecting the change.
For a non-trivial change: you may try to scan ILCode of the methods and detect structures in it. Compilers may inline and sometimes remove methods/loops/etc, but the overall structure is preserved. Specific block of code are executed n-times here or there, branches are in their place but maybe with sides swapped, etc. However, detecting the control structures is not easy, and comparing the code is even harder. For some codes it may give you a definitive answer of "exact same" but many times you will get "not same" even if they are. Some keywords on the subject is ... duplicate or plagiarism detector. This is how the research on such things started:) see ie. https://stackoverflow.com/questions/546487/tools-to-identify-code-duplications though I do not know if the tools mentioned scan the code, or the "bytes"..