There is a code that uses lambda expression
def ComputeArray(text):
# text is ended with $
if text[-1] != "$":
text += "$"
sarray = sorted(range(len(text)), key = lambda i: text[i:])
print ", ".join([str(x) for x in sarray])
if __name__ == "__main__":
ComputeArray("AACGATAGCGGTAAACGATAGCGGTAGA$")
it correctly outputs desired array
28, 27, 12, 0, 13, 1, 14, 25, 6, 19, 4, 17, 2, 15, 8, 21, 26, 3, 16, 7, 20, 9, 22, 10, 23, 11, 24, 5, 18
How could I improve line
sarray = sorted(range(len(text)), key = lambda i: text[i:])
so when increasing length of text I do not use lots of memory on a lambda expression?
Traceback (most recent call last):
File "C:\Array.py", line 23, in <module>
ComputeArray(text)
File "C:\Array.py", line 11, in ComputeArray
sarray = sorted(range(len(text)), key = lambda i: text[i:])
File "C:\Array.py", line 11, in <lambda>
sarray = sorted(range(len(text)), key = lambda i: text[i:])
MemoryError
UPDATE
There is other code like:
sarray=[]
for i in range(len(text)):
sarray.append(text[i:])
order=[i[0] for i in sorted(enumerate(sarray), key=lambda x:x[1])]
print ", ".join([str(x) for x in order])
However is taking so much memory,
Also I tried solution using library available on https://code.google.com/p/pysuffix/
s = 'AACGATAGCGGTAGA'
s = unicode(s,'utf-8','replace')
n = len(s)
sa = tks.simple_kark_sort(s)
lcp = tks.LCP(s,sa)
print n print sa
although it solves the problem, it takes too much time with larger strings, ... do you know other library or a method to improve suffix?