It is extremely probable the there is a more optimal solution to this problem that the one I reach below, but I wanted to show how I would approach the optimization of this code if I was assigned to this task. Also, I haven't executed any part of the code so you may find some syntactic errors on it.
============================================================================
Lets say that len(s) == n
, and len(queries) == m
.
Your current code is doing the following:
For each query, q:
1. find the character of the query, c
2. find the indices of other characters in the string that match c
3. find the closest index to the original index with the same character as the original index
Steps 1-3 are executed m
times, because there are m
queries.
And steps 2 and 3 have to iterate over the whole string s
(in worst case your string s
consist of the same character), so it executes n
steps.
So you are roughly executing 2n + 1
steps for each query, so in total you are executing roughly (2n + 1) * m
steps. This is (almost) what is called the runtime complexity of your algorithm. In big-O notation, the complexity would be: O(n*m)
.
Lets extract steps 2 and 3 into its own functions:
def findIdxListByPos(s, i):
foundidx = []
srchChr = s[i]
for j in range(0, len(s)):
if s[j] == srchChr:
foundidx.append(j)
return foundIdx
def findClosestIndex(foundidx, i):
# this is not needed because if the character appeared only once,
# foundidx will be empty and the "for fnditem in foundidx" will not
# do anything, so you can remove it
if len(foundidx) < 2:
return -1
lastIdx = -1
dist = 0
foundidx.remove(i)
for fnditem in foundidx:
if dist == 0:
lastIdx = fnditem
dist = abs(fnditem - i)
else:
if abs(fnditem - i) < dist:
lastIdx = fnditem
dist = abs(fnditem - i)
return lastIdx
def closest(s, queries):
s = s.lower()
listIdx = []
for i in queries:
foundidx = findIdxListByPos(s, i)
lastIdx = findClosestIndex(foundidx, i)
listIdx.append(lastIdx)
return listIdx
You can see that in findIdxListByPos
, you are always looking at every position in the string.
Now, lets say that you have a case where queries = [1, 1]
, then your are calculating two times the same foundidx
and the same lastIdx
. We can save that calculations and reuse them. That is, you save your foundidx
and lastIdx
inside another variables that are not lost after each query. You can do that in a dictionary with the character of the query as the key. If you have already calculated that key, you don't calculate again, just reuse it.
Your code will look like this:
def findIdxListByPos(s, i):
foundidx = []
srchChr = s[i]
for j in range(0, len(s)):
if s[j] == srchChr:
foundidx.append(j)
return foundIdx
def findClosestIndex(foundidx, i):
lastIdx = -1
dist = 0
foundidx.remove(i)
for fnditem in foundidx:
if dist == 0:
lastIdx = fnditem
dist = abs(fnditem - i)
else:
if abs(fnditem - i) < dist:
lastIdx = fnditem
dist = abs(fnditem - i)
return lastIdx
def calculateQueryResult(s, i, allFoundIdx):
srchChr = s[i]
if srchChr not in allFoundIdx:
allFoundIdx[srchChr] = findIdxListByPos(s, i)
foundidx = allFoundIdx[srchChr]
return findClosestIndex(foundidx, i)
def closest(s, queries):
s = s.lower()
listIdx = []
allFoundIdx = {}
queriesResults = {}
for i in queries:
if i not in queriesResults:
queriesResults[i] = calculateQueryResult(s, i, allFoundIdx)
listIdx.append(queriesResults[i])
return listIdx
This change increases the memory used by your algorithm and changes a little its runtime complexity.
Now, in the worst case you don't have any duplicates in your queries. What happens when you don't have duplicate queries? You have a query for each element in s
and all elements in s
are distinct!
queries = [0,1,2,...,n]
So len(queries) == n
, so n == m
then your algorithm has now a complexity of O(n*n) = O(n^2)
Now, you can see that in this worst case scenario, your allFoundIdx
dictionary will contain all the characters with all the positions in the string. So, memory wise is equivalent to calculating this dictionary upfront for all the values in the string. Calculating all upfront does not improve the runtime complexity but it does not make it worse either.
def findClosestIndex(foundidx, i):
lastIdx = -1
dist = 0
foundidx.remove(i)
for fnditem in foundidx:
if dist == 0:
lastIdx = fnditem
dist = abs(fnditem - i)
else:
if abs(fnditem - i) < dist:
lastIdx = fnditem
dist = abs(fnditem - i)
return lastIdx
def calculateAllFoundIdx(s):
allFoundIdx = {}
for i in range(0, len(s)):
srchChr = s[i]
# you should read about the get method of dictionaries. This will
# return an empty list if there is no value for the key srchChr in the
# dictionary
allFoundIdx[srchChr] = allFoundIdx.get(srchChr, []).append(i)
return allFoundIdx
def closest(s, queries):
s = s.lower()
listIdx = []
queriesResults = {}
# this has complexity O(n)
allFoundIdx = calculateAllFoundIdx(s)
# this still has complexity O(n^2) because findClosestIndex still has O(n)
# the for loop executes it n times
for i in queries:
if i not in queriesResults:
srchChr = s[i]
foundidx = allFoundIdx[srchChr]
queriesResults[i] = findClosestIndex(foundidx, i)
listIdx.append(queriesResults[i])
return listIdx
This algorithm is still O(n^2)
but now you just need to optimize the findClosestIndex
function, as there is no way to not iterate over all the queries.
So, in findClosestIndex
you are getting as parameters a list of numbers (the positions of some character in the original string) that is ordered incrementally (because of the way it was constructed), and another number of which you want to find the closest (this number is guaranteed to be included in the list).
The closest number (because the list is ordered) has to be the previous or the next one in the list. Any other number is "farther away" that these two.
So basically, you want to find the index of this number in the list and then the previous and next elements in the list, and compare their distances and return the smallest.
To find a number in an ordered list, you use binary search and you just have to be careful with the indices to obtain your final result:
def binSearch(foundidx, idx):
hi = len(foundidx) - 1
lo = 0
while lo <= hi:
m = (hi + lo) / 2
if foundidx[m] < idx:
lo = m + 1
elif found[m] > idx:
hi = m - 1
else:
return m
# should never get here as we are sure the idx is in foundidx
return -1
def findClosestIndex(foundidx, idx):
if len(foundidx) == 1:
return -1
pos = binSearch(foundidx, idx)
if pos == 0:
return foundidx[pos + 1]
if pos == len(foundidx) - 1:
return foundidx[pos - 1]
prevDist = abs(foundidx[pos - 1] - idx)
postDist = abs(foundidx[pos + 1] - idx)
if prevDist <= postDist:
return pos - 1
return pos + 1
def calculateAllFoundIdx(s):
allFoundIdx = {}
for i in range(0, len(s)):
srchChr = s[i]
# you should read about the get method of dictionaries. This will
# return an empty array if there is no value for the key srchChr in the
# dictionary
allFoundIdx[srchChr] = allFoundIdx.get(srchChr, []).append(i)
return allFoundIdx
def closest(s, queries):
s = s.lower()
listIdx = []
queriesResults = {}
# this has complexity O(n)
allFoundIdx = calculateAllFoundIdx(s)
# this has now complexity O(n*log(n)) because findClosestIndex now has O(log(n))
for i in queries:
if i not in queriesResults:
srchChr = s[i]
foundidx = allFoundIdx[srchChr]
queriesResults[i] = findClosestIndex(foundidx, i)
listIdx.append(queriesResults[i])
return listIdx
Now findClosestIndex
has a complexity of O(log(n))
, so closest
now has a complexity of O(n*log(n))
.
The worst case now is when all the elements in s
are the same, and queries = [0, 1, ..., len(s) - 1]