Suppose I have a list of very simple regex represented as strings (by "very simple", I mean only containing .*
). Every string in the list starts and ends with .*
. For example, I could have
rs = [.*a.*, .*ab.*, .*ba.*cd.*, ...]
What I would like to do is keep track of those patterns that are a subset of another. In this example, .*a.*
matches everything .*ab.*
does, and more. Hence, I consider the latter pattern to be redundant.
What I thought to do was to split the strings on .*
, match up corresponding elements, and test if one startswith
the other. More specifically, consider .*a.*
and .*ab.*
. Splitting these on .*
a = ['', 'a', '']
b = ['', 'ab', '']
and zip
ping them together gives
c = [('', ''), ('a', 'ab'), ('', '')]
And then,
all(elt[1].startswith(elt[0]) for elt in c)
returns True
and so I conclude that .*ab.*
is indeed redundant if .*a.*
is included in the list.
Does this make sense and does it do what I am trying to do? Of course, this approach gets complicated for a number of reasons, and so my next question is, is there a better way to do this that anyone has encountered previously?