1

I currently have 6 separate for loops which iterate over a list of numbers looking to match specific sequences of numbers within larger sequences, and replace them like this:

[...0,1,0...] => [...0,0,0...]
[...0,1,1,0...] => [...0,0,0,0...]
[...0,1,1,1,0...] => [...0,0,0,0,0...]

And their inverse:

[...1,0,1...] => [...1,1,1...]
[...1,0,0,1...] => [...1,1,1,1...]
[...1,0,0,0,1...] => [...1,1,1,1,1...]

My existing code is like this:

for i in range(len(output_array)-2):
    if output_array[i] == 0 and output_array[i+1] == 1 and output_array[i+2] == 0:
        output_array[i+1] = 0

for i in range(len(output_array)-3):
    if output_array[i] == 0 and output_array[i+1] == 1 and output_array[i+2] == 1 and output_array[i+3] == 0:
        output_array[i+1], output_array[i+2] = 0

In total I'm iterating over the same output_array 6 times, using brute force checking. Is there a faster method?

Rahul Madhavan
  • 304
  • 3
  • 10
ugotchi
  • 1,003
  • 1
  • 12
  • 32
  • Your question is unclear. Share some code and input & output examples – Ohad Eytan Aug 16 '16 at 08:37
  • Regarding time complexity, I would start with the `timeit` module to measure different implementations. If you need more detail about where your function spends the most time, use a [profiler](https://docs.python.org/3/library/profile.html). The theoretical complexity is often not a good guide because it depends very much on the assumed implementations and their complexity. Python's builtin data structures are highly optimized and may use a completely different implementation than you assume. – Malte Aug 16 '16 at 09:04
  • Could you add some examples ? For instance, what should `01011010` become ? Is it `00011110` ? – 3kt Aug 16 '16 at 09:22

2 Answers2

3
# I would create a map between the string searched and the new one.

patterns = {}
patterns['010'] = '000'
patterns['0110'] = '0000'
patterns['01110'] = '00000'

# I would loop over the lists

lists = [[0,1,0,0,1,1,0,0,1,1,1,0]]

for lista in lists:

    # i would join the list elements as a string
    string_list = ''.join(map(str,lista))

    # we loop over the patterns
    for pattern,value in patterns.items():

        # if a pattern is detected, we replace it
        string_list = string_list.replace(pattern, value)
        lista = list(string_list)
    print lista
Luis González
  • 3,199
  • 26
  • 43
  • 1
    I like this method, this seems like it could work and be more efficient. – ugotchi Aug 16 '16 at 09:22
  • 1
    Consider iterating over `patterns.items()` to avoid the explicit lookup of `patterns[pattern]` inside the loop. – Malte Aug 16 '16 at 11:11
1

While this question related to the questions Here and Here, the question from OP relates to fast searching of multiple sequences at once. While the accepted answer works well, we may not want to loop through all the search sequences for every sub-iteration of the base sequence.

Below is an algo which checks for a sequence of i ints only if the sequence of (i-1) ints is present in the base sequence

# This is the driver function which takes in a) the search sequences and 
# replacements as a dictionary and b) the full sequence list in which to search 

def findSeqswithinSeq(searchSequences,baseSequence):
    seqkeys = [[int(i) for i in elem.split(",")] for elem in searchSequences]
    maxlen = max([len(elem) for elem in seqkeys])
    decisiontree = getdecisiontree(seqkeys)
    i = 0
    while i < len(baseSequence):
        (increment,replacement) = get_increment_replacement(decisiontree,baseSequence[i:i+maxlen])
        if replacement != -1:
            baseSequence[i:i+len(replacement)] = searchSequences[",".join(map(str,replacement))]
        i +=increment
    return  baseSequence

#the following function gives the dictionary of intermediate sequences allowed
def getdecisiontree(searchsequences):
    dtree = {}
    for elem in searchsequences:
        for i in range(len(elem)):
            if i+1 == len(elem):
                dtree[",".join(map(str,elem[:i+1]))] = True
            else:
                dtree[",".join(map(str,elem[:i+1]))] = False
    return dtree

# the following is the function does most of the work giving us a) how many
# positions we can skip in the search and b)whether the search seq was found
def get_increment_replacement(decisiontree,sequence):
    if str(sequence[0]) not in decisiontree:
        return (1,-1)
    for i in range(1,len(sequence)):
        key = ",".join(map(str,sequence[:i+1]))
        if key not in decisiontree:
            return (1,-1)
        elif decisiontree[key] == True:
            key = [int(i) for i in key.split(",")]
            return (len(key),key)
    return 1, -1

You can test the above code with this snippet:

if __name__ == "__main__":
    inputlist = [5,4,0,1,1,1,0,2,0,1,0,99,15,1,0,1]
    patternsandrepls = {'0,1,0':[0,0,0],
                        '0,1,1,0':[0,0,0,0],
                        '0,1,1,1,0':[0,0,0,0,0],
                        '1,0,1':[1,1,1],
                        '1,0,0,1':[1,1,1,1],
                        '1,0,0,0,1':[1,1,1,1,1]}

    print(findSeqswithinSeq(patternsandrepls,inputlist))

The proposed solution represents the sequences to be searched as a decision tree.

Due to skipping the many of the search points, we should be able to do better than O(m*n) with this method (where m is number of search sequences and n is length of base sequence.

EDIT: Changed answer based on more clarity in edited question.

Community
  • 1
  • 1
Rahul Madhavan
  • 304
  • 3
  • 10
  • "the fastests way to achieve this" - achieve what? The OP was unclear about the intention of the replacements, but clear that they should be applied to another list. – Malte Aug 16 '16 at 09:01
  • @ugotchi, does the above code work for what you need? – Rahul Madhavan Aug 17 '16 at 17:46