i'll start with an example because i'm not sure i can explain this properly.
the easier part of the problem (i think although i can't get it either):
- take some strings eg. 'Example', 'DOMNode', 'DOMText', 'DOMElement'
- and output '(Example|DOM(Node|Text|Element))'
a more complicated part of the problem is matching from both ends of the strings
take some strings eg. 'Example', 'ArrayIterator', 'RecursiveArrayIterator', 'DirectoryIterator', 'RecursiveDirectoryIterator'
and output '(Example|(Recursive)?(Array|Directory)Iterator)'
i have a list of strings (patterns) to match against a subject.
i could simply concatenate the patterns with alternation (which is my current system) but i want to find a way to group common prefixes into alternation groups.
it's not really that much of an optimization but i've been trying to do it as an exercise for fun and now it's just giving me a headache haha.
i tried breaking each string by letter, matching every possible combination.
can't remember all the things i've tried a.t.m. i'm burning both ends of the candle.
i couldn't find a way to get common prefixes or store them so i could rebuild them into a regex. seems like a simple problem but i'm stuck.
i have this function for separating strings with underscores: (works good when you know how to separate prefixes ie by underscores)
<?php
/**
* separates snake case names into nested hierarchies.
*/
function processArray(array $funcs): array
{
$loop = false;
$current = false;
$newFuncs = [];
foreach ($funcs as $name)
{
$pos = strpos($name, '_');
if ($current and !str_starts_with($name, $current))
{
if ($loop || $pos)
{
$newFuncs[$current] = processArray($newFuncs[$current]);
$loop = false;
}
$current = false;
}
if ($pos)
{
$current = substr($name, 0, $pos + 1);
$newFuncs[$current] ??= [];
$subName = substr($name, $pos + 1);
$newFuncs[$current][] = $subName;
if (strpos($subName, '_'))
{
$loop = true;
}
}
else
{
if ($loop)
{
$newFuncs[$current] = processArray($newFuncs[$current]);
$loop = false;
}
$current = false;
$newFuncs[] = $name;
}
}
return $newFuncs;
}
function getRegex(array $strs): string
{
static $level = 0;
$ret = '(';
foreach ($strs as $key => $value)
{
if (is_array($value))
{
$strs[$key] = (is_string($key)?$key:'').getRegex($value);
}
}
$ret.= implode('|', $strs);
$ret.= ')';
return $ret;
}
$funcs = get_defined_functions()['internal'];
sort($funcs);
$funcs = processArray($funcs);
$getRegex = getRegex($funcs);
//remove isolated groups (groups with only one alternation)
do
{
$getRegex = preg_replace('~\(([a-zA-Z_0-9]+?)\)~', '$1', $getRegex, -1, $count);
}
while ($count);
var_dump($getRegex);
UPDATE
so I nearly solved it except now where an optional part is present, it adds an empty alternation before the optional string which I think will match first and never match the optional part eg My(|Opt)
will not match MyOpt
.
here's what i did:
function processRegexArray(array $funcs): array
{
$loop = false;
$current = false;
$newFuncs = [];
foreach ($funcs as $name)
{
preg_match('~^([A-Z]+(?=[A-Z])|[A-Z0-9_]*[a-z]+)~', $name, $matches);
$pos = strlen($matches[0]??='') -1;
if ($current and !str_starts_with($name, $current))
{
if ($loop || $pos)
{
$newFuncs[$current] = processRegexArray($newFuncs[$current]);
$loop = false;
}
$current = false;
}
if ($pos)
{
$current = substr($name, 0, $pos + 1);
$newFuncs[$current] ??= [];
$subName = substr($name, $pos + 1);
$newFuncs[$current][] = $subName;
preg_match('~^([A-Z]+(?=[A-Z])|[A-Z0-9_]*[a-z]+)~', $subName, $matches);
if ($matches)
{
$loop = true;
}
}
else
{
if ($loop)
{
$newFuncs[$current] = processRegexArray($newFuncs[$current]);
$loop = false;
}
$current = false;
$newFuncs[] = $name;
}
}
return $newFuncs;
}
function getRegex(array $strs): string
{
if (count($strs) === 0)
return '';
static $level = 0;
$ret = '(';
foreach ($strs as $key => $value)
{
if (is_array($value))
{
$strs[$key] = (is_string($key)?$key:'').getRegex($value);
}
}
$ret.= implode('|', $strs);
$ret.= ')';
return $ret;
}
$members = [...get_declared_classes(), ...get_declared_interfaces(), ...get_declared_traits()];
sort($members);
// remove namespaced names for now
foreach ($members as $key => $value)
{
if (strpos($value, '\\') !== false)
{
unset($members[$key]);
}
}
$members = processRegexArray($members);
$getRegex = getRegex($members);
do
{
$getRegex = preg_replace('~\(([^|()]*?)\)~', '$1', $getRegex, -1, $count);
}
while ($count);
var_dump($getRegex);
should output:
string(2302) "(AllowDynamicProperties|AppendIterator|ArgumentCountError|ArithmeticError|Array(Access|Iterator|Object)|AssertionError|Attribute|BackedEnum|Bad(FunctionCallException|MethodCallException)|CURL(File|StringFile)|CachingIterator|CallbackFilterIterator|ClosedGeneratorException|Closure|CompileError|Countable|Curl(Handle|MultiHandle|ShareHandle)|DOM(Attr|CdataSection|CharacterData|ChildNode|Comment|Document|DocumentFragment|DocumentType|Element|Entity|EntityReference|Exception|Implementation|NameSpaceNode|NamedNodeMap|Node|NodeList|Notation|ParentNode|ProcessingInstruction|Text)|DOMXPath|Date(Interval|Period|Time(|Immutable|Interface|Zone))|DeflateContext|Directory(|Iterator)|DivisionByZeroError|DomainException|EmptyIterator|Error(|Exception)|Exception|Fiber(|Error)|FilesystemIterator|FilterIterator|Generator|GlobIterator|HashContext|InfiniteIterator|InflateContext|InternalIterator|InvalidArgumentException|Iterator(|Aggregate|Iterator)|Json(Exception|Serializable)|LengthException|LibXMLError|LimitIterator|LogicException|MultipleIterator|NoRewindIterator|Open(SSL(AsymmetricKey|Certificate|CertificateSigningRequest))|Out(OfBoundsException|OfRangeException)|OuterIterator|OverflowException|PDO|PDO(Exception|Row|Statement)|ParentIterator|ParseError|Phar(|Data|Exception|FileInfo)|PhpToken|RangeException|Rar(Archive|Entry|Exception)|Recursive(ArrayIterator|CachingIterator|CallbackFilterIterator|DirectoryIterator|FilterIterator|Iterator(|Iterator)|RegexIterator|TreeIterator)|Reflection(|Attribute|Class(|Constant)|Enum(|BackedCase|UnitCase)|Exception|Extension|Fiber|Function(|Abstract)|Generator|IntersectionType|Method|NamedType|Object|Parameter|Property|Reference|Type|UnionType|ZendExtension)|Reflector|RegexIterator|ReturnTypeWillChange|RuntimeException|SeekableIterator|Sensitive(Parameter(|Value))|Serializable|Session(Handler(|Interface)|IdInterface|UpdateTimestampHandlerInterface)|Simple(XML(Element|Iterator))|Spl(DoublyLinkedList|File(Info|Object)|FixedArray|Heap|MaxHeap|MinHeap|ObjectStorage|Observer|PriorityQueue|Queue|Stack|Subject|TempFileObject)|Stringable|Throwable|Traversable|TypeError|UnderflowException|UnexpectedValueException|UnhandledMatchError|UnitEnum|ValueError|Weak(Map|Reference)|XML(Parser|Reader|Writer)|__PHP_Incomplete_Class|finfo|php_user_filter|stdClass)"