0

I have a list of strings :

mylist=["3>3>4>5","2>2>4","3>3>5>6>2","2>2>4>5", "4>5>5"]

I want to able to find the subpatterns in this list.

For example, the final result of this after passing through the pattern finder should return

{"3>3>":["3>3>4>5",3>3>5>6>2], "2>2>4":["2>2>4","2>2>4>5"]}

Currently, I am able to group the list by the first letter in the string. Parsing list mylist through find_sub_pattern results in

[["3>3>4>5",3>3>5>6>2],["2>2>4","2>2>4>5]]

def find_sub_pattern(data=[]):
    all_match=[]
    first_letter=[]
    for row in data:
        first_letter.append(row[0])

    list_freq=get_list_freq(first_letter)
    matched_first=[]
    for key, value in list_freq.items():
        if value > 1:
        matched_first.append(key)
    if matched_first==[]:
        return "No pattern match"
    matched_array=[]
    for p in range(0,len(matched_first)):
        matched_array.append([x for x in data if x[0] in matched_first[p]])
    print(matched_array)
Reegan Miranda
  • 2,879
  • 6
  • 43
  • 55

1 Answers1

1

This does what you want:

def common_start(sa, sb):
    def _iter():
        for a, b in zip(sa, sb):
            if a != b:
                return
            yield a
    return list(_iter())

l = ["3>3>4>5","2>2>4","3>3>5>6>2","2>2>4>5", "4>5>5"]
elems = [x.split(">") for x in l]
groups = [[x for x in elems if x[0] == group] for group in {x[0] for x in elems}]
result = {
    ">".join(reduce(common_start, group)):
    [">".join(x) for x in group] for group in groups if 1 < len(group)
}
fafl
  • 7,222
  • 3
  • 27
  • 50