Finding sub patterns in list of strings

Question

I have a list of strings :

mylist=["3>3>4>5","2>2>4","3>3>5>6>2","2>2>4>5", "4>5>5"]

I want to able to find the subpatterns in this list.

For example, the final result of this after passing through the pattern finder should return

{"3>3>":["3>3>4>5",3>3>5>6>2], "2>2>4":["2>2>4","2>2>4>5"]}

Currently, I am able to group the list by the first letter in the string. Parsing list mylist through find_sub_pattern results in

[["3>3>4>5",3>3>5>6>2],["2>2>4","2>2>4>5]]

def find_sub_pattern(data=[]):
    all_match=[]
    first_letter=[]
    for row in data:
        first_letter.append(row[0])

    list_freq=get_list_freq(first_letter)
    matched_first=[]
    for key, value in list_freq.items():
        if value > 1:
        matched_first.append(key)
    if matched_first==[]:
        return "No pattern match"
    matched_array=[]
    for p in range(0,len(matched_first)):
        matched_array.append([x for x in data if x[0] in matched_first[p]])
    print(matched_array)

Should the subpatterns be found automatically or are they defined in advance? — fafl, Jan 13 '17 at 11:34
The subpatterns must be found automatically with no defined pattern @fafl — Dennis Djan, Jan 13 '17 at 11:36
@ doctorlove the pattern refers to the substring common in the two or more string in the list — Dennis Djan, Jan 13 '17 at 11:37
So what if you had `"3>3>4>5", "3>3>4>6", "5>3>4", "3>3>6"`? What sub-patterns would you expect? — cdarke, Jan 13 '17 at 11:41
It will be hard to do without knowing the length of the common sub-string - for instance, should all strings starting with `"3>"` be in a group? And what happens to the `"4>5>5"` in your example? Think about doing a [tree](http://stackoverflow.com/questions/2461170/tree-implementation-in-python) of some kind with your lists, might be what you want. — berna1111, Jan 13 '17 at 11:41
What about `"3>3"` and "`3>4"`? Those are repeated too. Can you define what a sub pattern is? — cdarke, Jan 13 '17 at 11:45
the sub pattern will be the the pattern common in all the list items starting from the first letter of each item @ cdarke — Dennis Djan, Jan 13 '17 at 11:48

score 1 · Accepted Answer · answered Jan 13 '17 at 12:40

This does what you want:

def common_start(sa, sb):
    def _iter():
        for a, b in zip(sa, sb):
            if a != b:
                return
            yield a
    return list(_iter())

l = ["3>3>4>5","2>2>4","3>3>5>6>2","2>2>4>5", "4>5>5"]
elems = [x.split(">") for x in l]
groups = [[x for x in elems if x[0] == group] for group in {x[0] for x in elems}]
result = {
    ">".join(reduce(common_start, group)):
    [">".join(x) for x in group] for group in groups if 1 < len(group)
}

Exactly what I was looking for. Thanks – Dennis Djan Jan 13 '17 at 13:33 — Dennis Djan, Jan 13 '17 at 13:33

Finding sub patterns in list of strings

1 Answers1