Suppose you have two datasets that you need to make sure that they have not changed. For example, you have an array of objects in one hand, and another array in the other hand. Now, you need to verify that both arrays are exactly the same.
Each array can contain any type data: boolean, strings, objects, arrays, NULL
, etc.
When comparing both array contents should be exactly the same. Same data type and same order.
Instead of iterating over the array contents, with code that can compare different types of data, and possible recursive comparisons, I came with a solution that I would be grateful if you could shed a light if there is any downside in. PHP is the language, but I'm more interested in a language-neutral answer.
I serialized both datasets separately, and calculated their md5
hashes. I chose md5
because it is available without external extensions or libraries, and works quite fast. I am aware of chance of a collision, and md5
hashes are no where nearly cryptographically secure.
My question is that:
- Is it a widely used method to validate the arbitrary types of data. Checking file checksums make sense, but I have not personally used it to compare variables like this.
- I'm mainly doing this to keep my code simple. A comparison is probably faster because it can break the comparison whenever it finds a mismatch first. In my case, the length of the data is fairly small. About 5kb as a serialized string.
- Are there any other downsites that I should know off.
Thanks in advance.