0

I'm trying to write python code that will take a string and a length, and search through the string to tell me which sub-string of that particular length occurs the most, prioritizing the first if there's a tie.

For example, "cadabra abra" 2 should return ab

I tried:

import sys

def main():
    inputstring = str(sys.argv[1])
    length = int(sys.argv[2])
    Analyze(inputstring, length)    


def Analyze(inputstring, length):
    count = 0;
    runningcount = -1;
    sequence = ""
    substring = ""
    for i in range(0, len(inputstring)):    
        substring = inputstring[i:i+length]
        for j in range(i+length,len(inputstring)):
            #print(runningcount)
            if inputstring[j:j+2] == substring:
                print("runcount++")
                runningcount += 1
                print(runningcount)         
                if runningcount > count:
                    count = runningcount
                    sequence = substring


    print(sequence)             


main()

But can't seem to get it to work. I know I'm at least doing something wrong with the counts, but I'm not sure what. This is my first program in Python too, but I think my problem is probably more with the algorithm than the syntax.

Iron Fist
  • 10,739
  • 2
  • 18
  • 34
Austin
  • 6,921
  • 12
  • 73
  • 138
  • why `if inputstring[j:j+2] == substring:` ? shouldn't be `if inputstring[j:j+length] == substring:` instead? – Iron Fist Feb 06 '16 at 04:47
  • 1
    Check out this answer to a similar question: http://stackoverflow.com/a/14670769/1795128 . Using the Counter class simplifies the problem a lot – Zack Graber Feb 06 '16 at 04:47
  • Yea it would be j+length, thanks. I'll try taking a look at counter class, but I was trying to do this without researching too much python specific stuff yet – Austin Feb 06 '16 at 04:50
  • Do you count overlapping strings? E.g., should `'aaabbcbb' 2` return `'aa'` (occurring twice counting the overlap, and beating `'bb'` by occurring earlier), or should it return `'bb'`? Since you accepted Iron Fist's answer below, it looks like you do want to count the overlap, but that's not clear from the problem description. – gil Feb 06 '16 at 06:46

3 Answers3

2

Try to use built-in method, they will make your life easier, this way:

>>> s = "cadabra abra"
>>> x = 2
>>> l = [s[i:i+x] for i in range(len(s)-x+1)]
>>> l
['ca', 'ad', 'da', 'ab', 'br', 'ra', 'a ', ' a', 'ab', 'br', 'ra']
>>> max(l, key=lambda m:s.count(m))
'ab'

EDIT:

Much simpler syntax as per Stefan Pochmann comment:

>>> max(l, key=s.count)
Community
  • 1
  • 1
Iron Fist
  • 10,739
  • 2
  • 18
  • 34
1
import sys
from collections import OrderedDict

def main():
    inputstring = sys.argv[1]
    length = int(sys.argv[2])
    analyze(inputstring, length)

def analyze(inputstring, length):
    d = OrderedDict()
    for i in range(0, len(inputstring) - length + 1):    
        substring = inputstring[i:i+length]
        if substring in d:
            d[substring] += 1
        else:
            d[substring] = 1
    maxlength = max(d.values())
    for k,v in d.items():
        if v == maxlength:
            print(k)
            break

main()
John Gordon
  • 29,573
  • 7
  • 33
  • 58
0

Pretty good stab at a solution for a first Python program. As you learn the language, spend some time reading the excellent documentation. It is full of examples and tips.

For example, the standard library includes a Counter class for counting things (obviously) and an OrderedDict class which remebers the ording in which keys are entered. But the documentation includes an example that combines the two to make an OrderedCounter, which can be used to solve you problem like this:

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    pass

def analyze(s, n):
    substrings = (s[i:i+n] for i in range(len(s)-n+1))
    counts = OrderedCounter(substrings)
    return max(counts.keys(), key=counts.__getitem__)

analyze("cadabra abra", 2)
RootTwo
  • 4,288
  • 1
  • 11
  • 15