I have a table in an excel sheet containing strings, with let's say 3 columns and 2 rows. E.g.:
- "one" , "two", "three"
- "four", "five", "six"
I want to come up with an algorithm that can calculate a checksum for the whole table, which would be different if a new row is added (obvious) or even if one string switch places with another one from another row (less obvious).
So for the following table, we would have a different checksum than the previous one:
- "one" , "five", "three"
- "four", "two", "six"
I know there are a lot of possible ways to accomplish this, but is there a clean & robust way to do it in python? I had thought about using the position of the string in the table as a factor in the calculation, but I don't know whether it's a good idea or not as checksum calculation is one of my weaker skills.
I feel I should mention that in that same excel file, I have other data that can change without modifying the checksum though. The checksum should only verify the above-mentioned table's data, and must be a numeric value (because I need to send it over a CAN bus afterwards).
Does anyone have any suggestions as to how I could accomplish this? Please be precise and give examples.
EDIT: So I came up with an algorithm but yet completely unsure about its robustness. My idea was to first calculate a checksum for each cell of the first column and multiply it by the row index, then do the same for the seconds column and the third (if we use the same table as in the description). Then finally, sum the resulting checksums by multiplying each one of them by their column index. So something like:
total_checksum = (checksum('one') * 1 + checksum('four') * 2) * 1 + (checksum('two') * 1 + checksum('five') * 2) * 2 + (checksum('three') * 1 + checksum('six') * 2) * 3
I willingly posted this calculation like this without any fancy code just to explain my point. Now for the code (not pretty at the moment sorry for this), we can assume that the table contents have been parsed into a list of dictionary with key=column and item=value. For the table from the example we have:
tab = [{"A": "one", "B": "two", "C": "three"},
{"A": "four", "B": "five", "C": "six"}]
print str(calculate_checksum(tab))
def calculate_checksum(table):
"""Calculate checksum of a dictionary (represents the excel table)"""
def checksum(st):
"""Convert a string into tables of ascii characters and reduce it"""
return reduce(lambda x, y: x + y, map(ord, st))
total_cks = column_count = 0
# list the data of each column:
for column in ['A', 'B', 'C']:
data_list = [data[column] for data in tables]
chk = 0
for row_index in range(len(data_list)):
chk += checksum(data_list[row_index]) * (row_index + 1) # multiply each cell's checksum by its row index
column_count += 1
total_cks += chk * column_count # multiply each column checksum by its column index
return total_cks
With this algorithm, the original table's checksum is 7254.When I switch "two" and "five", I get 7094 so it works. When I switch "one" and "two" I get 7230 that's covered too.