Simplifying for-if messes with better structure?

Question

Please, move this question to Code Review -area. It is better suited there because I know the code below is junk and I want critical feedback to complete rewrite. I am pretty much reinventing the wheel.

# Description: you are given a bitwise pattern and a string
# you need to find the number of times the pattern matches in the string.
# The pattern is determined by markov chain.
# For simplicity, suppose the ones and zeros as unbiased coin flipping
# that stops as it hits the pattern, below.
#
# Any one liner or simple pythonic solution?

import random

def matchIt(yourString, yourPattern):
        """find the number of times yourPattern occurs in yourString"""

        count = 0
        matchTimes = 0

        # How can you simplify the for-if structures?
        # THIS IS AN EXAMPLE HOW NOT TO DO IT, hence Code-Smell-label
        # please, read clarifications in [Update]

        for coin in yourString:
            #return to base
            if  count == len(pattern):
                    matchTimes = matchTimes + 1
                    count = 0

            #special case to return to 2, there could be more this type of conditions
            #so this type of if-conditionals are screaming for a havoc
            if count == 2 and pattern[count] == 1:
                    count = count - 1

            #the work horse
            #it could be simpler by breaking the intial string of lenght 'l'
            #to blocks of pattern-length, the number of them is 'l - len(pattern)-1'
            if coin == pattern[count]:
                    count=count+1

        average = len(yourString)/matchTimes

        return [average, matchTimes]



# Generates the list
myString =[]
for x in range(10000):
    myString= myString + [int(random.random()*2)]

pattern = [1,0,0]
result = matchIt(myString, pattern)

print("The sample had "+str(result[1])+" matches and its size was "+str(len(myString))+".\n" +
        "So it took "+str(result[0])+" steps in average.\n" +
        "RESULT: "+str([a for a in "FAILURE" if result[0] != 8]))


# Sample Output
# 
# The sample had 1656 matches and its size was 10000.
# So it took 6 steps in average.
# RESULT: ['F', 'A', 'I', 'L', 'U', 'R', 'E']

[Update]

I will explain here a bit about theory, perhaps, the problem can be simplified that way. The above code try to construct the markov chain with transition matrix A below. The pattern 100 that you can imagine as coin flipping corresponds to it.

>>> Q=numpy.matrix('0.5 0.5 0; 0 0.5 0.5; 0 0.5 0')     
>>> I=numpy.identity(3)
>>> I
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])
>>> Q
matrix([[ 0.5,  0.5,  0. ],
        [ 0. ,  0.5,  0.5],
        [ 0. ,  0.5,  0. ]])
>>> A=numpy.matrix('0.5 0.5 0 0; 0 0.5 0.5 0; 0 0.5 0 0.5; 0 0 0 1')
>>> A
matrix([[ 0.5,  0.5,  0. ,  0. ],
        [ 0. ,  0.5,  0.5,  0. ],
        [ 0. ,  0.5,  0. ,  0.5],
        [ 0. ,  0. ,  0. ,  1. ]])

The average 8 in the question becomes the sum of values on the first row in the matrix N=(I-Q)^-1 where Q above.

>>> (I-Q)**-1
matrix([[ 2.,  4.,  2.],
        [ 0.,  4.,  2.],
        [ 0.,  2.,  2.]])
>>> numpy.sum(((I-Q)**-1)[0])
8.0

Now, you can probably see that this apparently-only-pattern-matching-problem becomes a markov chain. I cannot see a reason why you could not substitute the messy for-while-if conditions with something similar to matrices or matrices. I don't know how to implement them but iterators could be a way to go, researching, particularly with more states where you need to decompose.

But a problem emerged with Numpy, what are the things -Inf and NaN for? Check the values to which they should converge, above, from (I-Q)**-1 matrix. The N is from N=I+Q+Q^2+Q^3+...=\frac{I-Q^{n}}{I-Q}.

>>> (I-Q**99)/(I-Q)
matrix([[  2.00000000e+00,   1.80853571e-09,             -Inf],
        [             NaN,   2.00000000e+00,   6.90799171e-10],
        [             NaN,   6.90799171e-10,   1.00000000e+00]])
>>> (I-Q**10)/(I-Q)
matrix([[ 1.99804688,  0.27929688,        -Inf],
        [        NaN,  1.82617188,  0.10742188],
        [        NaN,  0.10742188,  0.96679688]])

Is this a homework or interview assignment? If so, please tag it as homework. — gotgenes, Jan 12 '11 at 16:02
Also, the use of triple-quoted strings as comments is atrocious. They should be true comments (i.e., prefixed with `#`). — gotgenes, Jan 12 '11 at 16:04
gotgenes: no, it is me doing something else than math. This problem could be killed with better structure. — hhh, Jan 12 '11 at 16:05
Using a single triple-quotes string as a comment would at least make some sense. But this way, you're wasting 4 or 5 characters! — , Jan 12 '11 at 16:06
Docstrings should be put _inside_ the function, not outside it. — kennytm, Jan 12 '11 at 16:11
Could you post sample input and sample output? Also, you are using `myString` in your function, but that's an argument to the function. — milkypostman, Jan 12 '11 at 17:25
There's a comment saying "special case to return to 2" associated with the following code `if count == 2 and pattern[count] == 1`. What's so special about that case? — martineau, Jan 12 '11 at 18:09
martineau: nothing to me but tried to point it out that this kind of conditionals can become tricky with if-for-structures. Really give me a second and I have solved this myself with some maps — hhh, Jan 12 '11 at 18:38

eumiro · Answer 1 · 2011-01-12T19:33:37.143

2

def matchIt(yourString, yourPattern):
        """find the number of times yourPattern occurs in yourString"""

Are you allowed to use the following?

yourString.count(yourPattern)

In your case you could create myString as a real string of 10000 characters and the pattern also as a string and then count the occurence in a simple pythonic way.

EDIT

A one-liner that gives you the number of (overlapping) occurences of pattern in text (which can be either a string or a list), could look like this:

nbOccurences = sum(1 for i in xrange(len(text)-len(pattern)) if text[i:i+len(pattern)] == pattern)

edited Jan 12 '11 at 19:33

answered Jan 12 '11 at 18:12

eumiro

207,213
34
299
261

N.B. That gives non-overlapping occurences, i.e. `"abababcd".count("abab")` gives 1. That may or may not be what the poster is after, it's really quite hard to tell. – Thomas K Jan 12 '11 at 18:25
The definition of the M.C. will infer overlapping occurences, so yes I want overlapping occurrences as well. – hhh Jan 12 '11 at 18:37
@HH - I have edited my answer to include overlapping occurences as well... – eumiro Jan 12 '11 at 19:34

Hugh Bothwell · Accepted Answer · 2011-01-12T23:10:54.017

Ok - a standard(-ish) string search:

def matchIt(needle, haystack):
    """
    @param needle:   string, text to seek
    @param haystack: string, text to search in

    Return number of times needle is found in haystack,
        allowing overlapping instances.

    Example: matchIt('abab','ababababab') -> 4
    """
    lastSeenAt = -1
    timesSeen = 0
    while True:
        nextSeen = haystack.find(needle, lastSeenAt+1)
        if nextSeen==-1:
            return timesSeen
        else:
            lastSeenAt = nextSeen
            timesSeen += 1

but you want to do this to a list of numbers? No problem; we just need to make a list class with a find() method, like so:

import itertools
class FindableList(list):
    def find(self, sub, start=None, end=None):
        """
        @param sub: list, pattern to look for in self

        @param start: int, first possible start-of-list
            If not specified, start at first item

        @param: end: int, last+1 possible start-of-list
            If not specified, end such that entire self is searched

        Returns;
            Starting offset if a match is found, else -1
        """
        if start is None or start < 0:
            start = 0

        # N.B. If end is allowed to be too high,
        # zip() will silently truncate the list comparison
        # and you will probably get extra spurious matches.
        lastEnd = len(self) - len(sub) + 1
        if end is None or end > lastEnd:
            end = lastEnd

        rng = xrange if xrange else range
        iz  = itertools.izip
        isl = itertools.islice

        for pos in rng(start, end):
            if all(a==b for a,b in iz(sub, isl(self, pos, end))):
                return pos

        # no match found
        return -1

then the example looks like

matchIt([1,2,1,2], FindableList([1,2,1,2,1,2,1,2,1,2])) -> 4

and your code becomes:

# Generate a list
randIn = lambda x: int(x*random.random())
myString =[randIn(2) for i in range(10000)]

pattern = [1,0,0]
result = matchIt(pattern, myString)

print("The sample had {0} matches and its size was {1}.\n".format(result, len(myString)))

score 0 · Answer 3 · edited May 23 '17 at 12:25

This is not ready.

Similar question but main focus on graph libraries here and similar question but in C#, maybe useful.

The files that are relevant to this question are ./networkx/generators/degree_seq.py (997 lines, about generating graps with given degree sequence) and ./networkx/algorithms/mixing.py (line 20, function degree_assortativity(G) about probability based graphs) and also note that its source code refer to 92 references, not sure whether you want to reinvent the wheel. For igraph, please, read the line 835 of the file convert.c about weighted edges. You can get the source for Networkx here and the source for igraph here. Please, note that the former is under BSD license and done in Python while igraph under GNU (GPL) and done in C.

To get started with Networkx, a helpful line about creating weighted graph from its jUnits test_convert_scipy.py -file:

def create_weighted(self, G): 
    g = cycle_graph(4)
    e = g.edges()
    source = [u for u,v in e]
    dest = [v for u,v in e]
    weight = [s+10 for s in source]
    ex = zip(source, dest, weight)
    G.add_weighted_edges_from(ex)
    return G

So to create your Markov Chain, help about directed weighted graph here, something like this perhaps:

>>> DG=nx.DiGraph()
>>> DG.add_weighted_edges_from([(0,0,0.5),(1,1,0.5),(3,3,1),(0,1,0.5),(1,2,0.5),(2,3,0.5), (2,1,0.5)])

or perhaps there is some ready markov chain generation tool as there are for some other stochastic processes, more here. Cannot find an algorithm to analyze the graph with excepted value or do trials with different sets as in your example, perhaps there is none, and you must stick with other repliers' solutions.

Simplifying for-if messes with better structure?

3 Answers3