0

I am fairly new to Programming world. I am trying to create a common regex that would match only list of strings given, nothing more than that.

For Eg., given the below list

List = ['starguide,'snoreguide','snoraguide','smarguides']

It should create a regex like this - s(((tar|nor(e|a))(guide))|marguides)

I implemented a trie. Could only manage to get s(marguides|nor(aguide|eguide)|targuide)

I want my regex to be shortened (common suffixes tied together). Is there any better way to shorten the regex I am getting from the trie?

getuj
  • 17
  • 2
  • Easy peasy 'l'.join(mylist) in python – Serge Oct 22 '18 at 13:36
  • See automata minimizatio n algorithms – Serge Oct 22 '18 at 13:38
  • [This post](https://stackoverflow.com/a/42789508/3832970) has all the details and code you need. Copy/paste. – Wiktor Stribiżew Oct 22 '18 at 13:38
  • @WiktorStribiżew: I tried implementing your code, but it gives `s(marguides|nor(aguide|eguide)|targuide)` . – getuj Oct 22 '18 at 13:52
  • Do you aim at shorter regex or faster – Serge Oct 22 '18 at 14:00
  • For shorter read about nd automata minimization – Serge Oct 22 '18 at 14:01
  • Which is hard problem and regex engine still is likely to determinize the underlying state machine / acceptor internally https://cstheory.stackexchange.com/questions/31630/how-can-one-actually-minimize-a-regular-expression – Serge Oct 22 '18 at 14:22
  • Dsm minization method will give reex you want – Serge Oct 22 '18 at 14:26
  • Please open the question, it is seem not be about the performance, addressed by a previous topic, but minimizing regex itself – Serge Oct 22 '18 at 14:34
  • Try https://github.com/siddharthasahu/automata-from-regex to build min deterministic state machine/automaton, then build optimized regex with then build optimized regex with https://cs.stackexchange.com/questions/2016/how-to-convert-finite-automata-to-regular-expressions – Serge Oct 22 '18 at 15:20
  • above will produce your expected output ( though in general case, non-determinist automata could yield shorter regex) – Serge Oct 22 '18 at 15:21
  • also see https://stackoverflow.com/questions/15807365/find-simplest-regular-expression-matching-all-given-strings – Serge Oct 22 '18 at 15:32
  • @Serge Thankyou for the pointers! Will look into them. I reopened my question. – getuj Oct 22 '18 at 17:40

1 Answers1

0

To get the desired result try use automata minimization.

For your simple example, deterministic automaton suffices.

Use github.com/siddharthasahu/automata-from-regex to build min deterministic state machine/automaton from trivial regex (enumeration of words), then transform automaton into regex (it is easy for acyclic automata, http://www-igm.univ-mlv.fr/~dr/thdr/ www.dcc.fc.up.pt/~nam/publica/extAbsCIAA05.pdf) see also https://cs.stackexchange.com/questions/2016/how-to-convert-finite-automata-to-regular-expressions

In general case, non-determinist automata could yield shorter regex, yet it is a hard problem https://cstheory.stackexchange.com/questions/31630/how-can-one-actually-minimize-a-regular-expression

Serge
  • 3,387
  • 3
  • 16
  • 34