As an initial situation, I have a sha1 hash value. I want to compare this with a file full of hash values to see if the sha1 hash value is contained in the file with the hash values.
So more exactly:
f1=sha1 #value read in
fobj = open("Hashvalues.txt", "r") #open file with hash values
for f1 in fobj:
print ("hash value found")
else:
print("HashValue not found")
fobj.close()
The file is very large (11.1GB)
Is there a useful algorithm to perform the search as fast as possible? The hash values in the hash file are ordered by hashes. I think comparing this line by line won't be the fastest way, will it?
EDIT: I changed my Code as follows:
f1="9bc34549d565d9505b287de0cd20ac77be1d3f2c" #value read in
with open("pwned-passwords-sha1-ordered-by-hash-v5.txt") as f:
lineList = [line.rstrip('\n\r') for line in open("pwned-passwords-sha1-
ordered-by-hash-v5.txt")]
def binarySearch(arr, l, r, x):
while l <= r:
mid = l + (r - l)/2;
# Check if x is present at mid
if arr[mid] == x:
return mid
# If x is greater, ignore left half
elif arr[mid] < x:
l = mid + 1
# If x is smaller, ignore right half
else:
r = mid - 1
# If we reach here, then the element
# was not present
return -1
# Test array
arr = lineList
x = "9bc34549d565d9505b287de0cd20ac77be1d3f2c" #value read in
# Function call
result = binarySearch(arr, 0, len(arr)-1, x)
if result != -1:
print "Element is present at index % d" % result
else:
print "Element is not present in array"
But it doesn't work as fast as i thought. Is my Implementation correct?
EDIT2:
def binarySearch (l, r, x):
# Check base case
if r >= l:
mid = l + (r - l)/2
# If element is present at the middle itself
if getLineFromFile(mid) == x:
return mid
# If element is smaller than mid, then it
# can only be present in left subarray
elif getLineFromFile(mid) > x:
return binarySearch(l, mid-1, x)
# Else the element can only be present
# in right subarray
else:
return binarySearch(mid + 1, r, x)
else:
# Element is not present in the array
return -1
x = '0000000A0E3B9F25FF41DE4B5AC238C2D545C7A8:15'
def getLineFromFile(lineNumber):
with open('testfile.txt') as f:
for i, line in enumerate(f):
if i == lineNumber:
return line
else:
print('Not 7 lines in file')
line = None
# get last Element of List
def tail():
for line in open('pwned.txt', 'r'):
pass
else:
print line
ausgabetail = tail()
#print ausgabetail
result = binarySearch( 0, ausgabetail, x)
if result != -1:
print "Element is present at index % d" % result
else:
print "Element is not present in array"
My problem now is to get the correct index for the right side for the binary search. I pass the function (l, r, x). The left side starts at the beginning with 0. The right side should be the end of the file so the last line. I try to get that but it doesn't work. I tried to get this with the Funktion tail(). But if I print r on testing, I get the value "None". Do you have another idea here?