3

For the Google Prettify syntax highlighter for the Wolfram Language, I need to match all identifiers against a large list of about 7000 built-in function names to highlight them as keywords. In the past, I simply used a regex consisting of many alternations. To give a concrete example, here are all functions that start with Plot:

(:?Plot|Plot3D|Plot3Matrix|PlotDivision|PlotJoined|PlotLabel|PlotLabels|PlotLayout|PlotLegends|PlotMarkers|PlotPoints|PlotRange|PlotRangeClipping|PlotRangeClipPlanesStyle|PlotRangePadding|PlotRegion|PlotStyle|PlotTheme)

Theoretically, this is a bad choice because sub-matches need to be done repeatedly for each alternative keyword. What one could do is to build a Trie (or prefix tree) and construct a regex from it. This would skip whole all sub-matches if the prefix is not correct. For the above words, this would look like this

(:?Plot(?:3(?:D|Matrix)|Division|Joined|L(?:a(?:bel(?:s)?|yout)|egends)|Markers|Points|R(?:ange(?:Clip(?:PlanesStyle|ping)|Padding)?|egion)|Style|Theme)?)

Although this looks a bit messy, the idea is simple: it tests the prefix Plot and if doesn't match, no further testing is required. If it does match, then it jumps into the inner expression and tests the next prefixes until it either has a complete match or not.

I have timed regex for all 7000 keywords matching against themselves and 7000 dictionary words as negative examples with Kotlin. The Trie implementation needs 78ms, while the alternation regex needs 523ms. That's the vast improvement I expected.

However, in JavaScript, the Trie implementation seems to be a bit slower. For the purpose of profiling, I set up 2 webpages that call the prettify engine with the two different methods. Here is the Trie implementation and here is the alternation implementation. I then used Chrome dev tools to profile this but I'm not particularly experienced with JS.

Questions:

  1. Can someone explain, why the Trie regex appears to be slower when (a) the regex itself is smaller (although heavily nested) and (b) the regex should in theory make many shortcuts when matching?
  2. Given the two test-sites or even the pure regex here and here, can someone tell me how to properly profile this? The list of keywords and dict-words are available here.
Emma
  • 27,428
  • 11
  • 44
  • 69
halirutan
  • 4,281
  • 18
  • 44
  • My gut tells me this should be asked on https://codereview.stackexchange.com/ (however, that may not work out in this case). – wp78de May 16 '19 at 22:35
  • The RegExp engine normally parses the regexp into a finite automaton. So it's not matching the way you naively assume it is. – Barmar May 17 '19 at 00:20
  • When it's parsing, it can detect common prefixes and implement them like the Trie. – Barmar May 17 '19 at 00:21
  • @Barmar OK, so one assumption is that the engine does internally something similar and builds an automaton that makes use of the common prefixes. This sounds reasonable. What I hoped to find was some kind of reference where this is mentioned. I checked many JS regex performance posts but was indeed not able to find what I was looking for. – halirutan May 17 '19 at 00:25
  • [this](https://swtch.com/~rsc/regexp/) looks like a good collection of links to papers about how regular expressions are implemented. – Barmar May 17 '19 at 00:27
  • Benchmark of keywords: https://pastebin.com/9NMBn5d0 2 regex, one is the original (on bottom), the other is a _full blown, multi-level, trie_ . One takes 13 ms to find 7071 items, the original takes 467 ms. –  May 17 '19 at 01:04
  • Keyword regex formatted https://pastebin.com/b1ejcvCS –  May 17 '19 at 01:08
  • @sln This is interesting as it reflects the results I had with Kotlin. Thank you very much for that. If you provide an answer with this information, I'm glad to accept it as my original claim that in JS the trie is slower is obviously wrong. – halirutan May 17 '19 at 01:15
  • @Barmar Thank you very much for the information! – halirutan May 17 '19 at 01:16
  • @halirutan - This is a _full blown, multi-level regex trie_ Engines don't have to do any work at all, nothing. It's all done for them. –  May 17 '19 at 01:30
  • [175,000 Word Dictionary](http://www.regexformat.com/Dnl/_Samples/_Ternary_Tool%20(Dictionary)/___txt/_ASCII_175,000_word_Mix_A-Z_Multi_Lined.txt) Ternary Tree regex. Takes less than 5 steps to find _any word_. –  May 17 '19 at 01:34

1 Answers1

2

The tool: Strings To Regex - Ternary Tree


namedCharacters (formatted https://pastebin.com/HPJRGxuf)

Regex1:   \\\[(?:A(?:Acute|Bar|Cup|DoubleDot|E|Grave|Hat|Ring|Tilde|kuz|l(?:eph|i(?:as(?:Delimite|Indicato)r|gnmentMarker)|pha|tKey)|n(?:dy?|g(?:le|strom))|quariusSign|riesSign|scendingEllipsis|uto(?:LeftMatch|Operand|Placeholder|RightMatch|Space))|B(?:ackslash|e(?:amed(?:Eigh|Sixteen)thNote|cause|ta?)|lack(?:Bishop|K(?:ing|night)|Pawn|Queen|Rook)|reve|ullet)|C(?:Acute|Cedilla|Hacek|a(?:ncerSign|p(?:ital(?:A(?:Acute|Bar|Cup|DoubleDot|E|Grave|Hat|Ring|Tilde|lpha)|Beta|C(?:Acute|Cedilla|Hacek|hi)|D(?:Hacek|elta|i(?:fferentialD|gamma))|E(?:Acute|Bar|Cup|DoubleDot|Grave|Ha(?:cek|t)|psilon|t[ah])|Gamma|I(?:Acute|Cup|DoubleDot|Grave|Hat|ota)|K(?:a|o)ppa|L(?:Slash|ambda)|Mu|N(?:Hacek|Tilde|u)|O(?:Acute|Double(?:Acute|Dot)|E|Grave|Hat|Slash|Tilde|m(?:ega|icron))|P(?:h||s)i|R(?:Hacek|ho)|S(?:Hacek|ampi|igma|tigma)|T(?:Hacek|au|h(?:eta|orn))|U(?:Acute|Double(?:Acute|Dot)|Grave|Hat|Ring|psilon)|Xi|YAcute|Z(?:Hacek|eta))|ricornSign)?)|e(?:dilla|nt(?:er(?:Dot|Ellipsis))?)|h(?:eck(?:edBox|mark(?:edBox)?)|i)|ircle(?:Dot|Minus|Plus|Times)|l(?:o(?:ckwiseContourIntegral|seCurly(?:Double)?Quote|verLeaf)|ubSuit)|o(?:lon|mmandKey|n(?:ditioned|gruent|jugate(?:Transpose)?|stantC|t(?:inu(?:ation|edFractionK)|ourIntegral|rolKey))|p(?:roduc|yrigh)t|unterClockwiseContourIntegral)|ross|u(?:p(?:Cap)?|r(?:l(?:y(?:CapitalUpsilon|Epsilon|Kappa|P(?:h)?i|Rho|Theta))?|rency)))|D(?:Hacek|a(?:gger|let|sh)|e(?:gree|l(?:eteKey|ta)?|scendingEllipsis)|i(?:am(?:eter|ond(?:Suit)?)|fferen(?:ceDelta|tialD)|gamma|rectedEdge|s(?:cret(?:e(?:Ratio|Shift)|ionary(?:Hyphen|LineSeparator|Pa(?:geBreak(?:Above|Below)|ragraphSeparator)))|tributed)|v(?:ergence|i(?:des?|sionSlash)))|o(?:t(?:Equal|less[IJ]|tedSquare)|uble(?:ContourIntegral|D(?:agger|o(?:t|wnArrow))|L(?:eft(?:Arrow|RightArrow|Tee)|ong(?:Left(?:A|RightA)rrow|RightArrow))|Prime|Right(?:Arrow|Tee)|Struck(?:[AB]|C(?:apital[A-Z])?|D|E(?:ight)?|F(?:ive|our)?|[G-M]|N(?:ine)?|O(?:ne)?|[P-R]|S(?:even|ix)?|T(?:hree|wo)?|[U-Y]|Z(?:ero)?)|Up(?:A|DownA)rrow|VerticalBar|d(?:Gamma|Pi))|wn(?:Arrow(?:Bar|UpArrow)?|Breve|Exclamation|Left(?:RightVector|TeeVector|Vector(?:Bar)?)|Pointer|Question|Right(?:TeeVector|Vector(?:Bar)?)|Tee(?:Arrow)?)))|E(?:Acute|Bar|Cup|DoubleDot|Grave|Ha(?:cek|t)|arth|ighthNote|l(?:ement|lipsis)|mpty(?:Circle|D(?:iamond|ownTriangle)|Rectangle|S(?:et|mall(?:Circl|Squar)e|quare)|UpTriangle|VerySmallSquare)|nt(?:erKey|ity(?:End|Start))|psilon|qu(?:al(?:Tilde)?|i(?:librium|valent))|rrorIndicator|scapeKey|t[ah]|uro|x(?:ists|p(?:ectation|onential)E))|F(?:i(?:Ligature|lled(?:Circle|D(?:iamond|ownTriangle)|LeftTriangle|R(?:ect|ightTri)angle|S(?:mall(?:Circl|Squar)e|quare)|UpTriangle|VerySmallSquare)|nalSigma|rstPage|vePointedStar)|l(?:Ligature|at|orin)|or(?:All|mal(?:A(?:lpha)?|B(?:eta)?|C(?:apital(?:A(?:lpha)?|B(?:eta)?|C(?:hi)?|D(?:elta|igamma)?|E(?:psilon|ta)?|F|G(?:amma)?|H|I(?:ota)?|J|K(?:appa|oppa)?|L(?:ambda)?|Mu?|Nu?|O(?:m(?:ega|icron))?|P(?:hi|i|si)?|Q|R(?:ho)?|S(?:ampi|igma|tigma)?|T(?:au|heta)?|U(?:psilon)?|[VW]|Xi?|Y|Z(?:eta)?)|hi|urly(?:CapitalUpsilon|Epsilon|Kappa|P(?:h)?i|Rho|Theta))?|D(?:elta|igamma)?|E(?:psilon|ta)?|F(?:inalSigma)?|G(?:amma)?|H|I(?:ota)?|J|K(?:appa|oppa)?|L(?:ambda)?|Mu?|Nu?|O(?:m(?:ega|icron))?|P(?:hi|i|si)?|Q|R(?:ho)?|S(?:ampi|igma|tigma)?|T(?:au|heta)?|U(?:psilon)?|[VW]|Xi?|Y|Z(?:eta)?))|re(?:akedSmiley|eformPrompt)|unction)|G(?:amma|eminiSign|imel|othic(?:[AB]|C(?:apital[A-Z])?|D|E(?:ight)?|F(?:ive|our)?|[G-M]|N(?:ine)?|O(?:ne)?|[P-R]|S(?:even|ix)?|T(?:hree|wo)?|[U-Y]|Z(?:ero)?)|r(?:a(?:dient|y(?:Circl|Squar)e)|eater(?:Equal(?:Less)?|FullEqual|Greater|Less|SlantEqual|Tilde)))|H(?:Bar|a(?:cek|ppySmiley)|e(?:artSuit|rmitianConjugate)|orizontalLine|ump(?:DownHump|Equal)|yphen)|I(?:Acute|Cup|DoubleDot|Grave|Hat|m(?:aginary[IJ]|pli(?:citPlu|e)s)|n(?:dentingNewLine|finity|linePart|te(?:gral|rsection)|visible(?:Application|Comma|P(?:ost|re)fixScriptBase|Space|Times))|ota)|Jupiter|K(?:appa|e(?:rnelIcon|yBar)|oppa)|L(?:Slash|a(?:mbda|placian|stPage)|e(?:ft(?:A(?:ngleBracket|rrow(?:Bar|RightArrow)?|ssociation)|BracketingBar|Ceiling|Do(?:ubleBracket(?:ingBar)?|wn(?:TeeVector|Vector(?:Bar)?))|Floor|Guillemet|Modified|Pointer|Right(?:Arrow|Vector)|Skeleton|T(?:ee(?:Arrow|Vector)?|riangle(?:Bar|Equal)?)|Up(?:DownVector|TeeVector|Vector(?:Bar)?)|Vector(?:Bar)?)|oSign|ss(?:Equal(?:Greater)?|FullEqual|Greater|Less|SlantEqual|Tilde)|tterSpace)|i(?:braSign|ghtBulb|mit|neSeparator)|o(?:ng(?:Dash|Equal|Left(?:A|RightA)rrow|RightArrow)|wer(?:Lef|Righ)tArrow))|M(?:a(?:rs|thematicaIcon|xLimit)|e(?:asuredAngle|diumSpace|rcury)|ho|i(?:cro|n(?:Limit|us(?:Plus)?))|o(?:d(?:1|2)Key|on)|u)|N(?:Hacek|Tilde|a(?:nd|tural)|e(?:gative(?:MediumSpace|Thi(?:ck|n)Space|VeryThinSpace)|ptune|sted(?:GreaterGreater|LessLess)|utralSmiley)|o(?:Break|nBreakingSpace|r|t(?:C(?:ongruent|upCap)|DoubleVerticalBar|E(?:lement|qual(?:Tilde)?|xists)|Greater(?:Equal|FullEqual|Greater|Less|SlantEqual|Tilde)?|Hump(?:DownHump|Equal)|Le(?:ftTriangle(?:Bar|Equal)?|ss(?:Equal|FullEqual|Greater|Less|SlantEqual|Tilde)?)|Nested(?:GreaterGreater|LessLess)|Precedes(?:Equal|SlantEqual|Tilde)?|R(?:everseElement|ightTriangle(?:Bar|Equal)?)|S(?:quareSu(?:bset(?:Equal)?|perset(?:Equal)?)|u(?:bset(?:Equal)?|cceeds(?:Equal|SlantEqual|Tilde)?|perset(?:Equal)?))|Tilde(?:Equal|FullEqual|Tilde)?|VerticalBar)?)|u(?:ll|mberSign)?)|O(?:Acute|Double(?:Acute|Dot)|E|Grave|Hat|Slash|Tilde|m(?:ega|icron)|p(?:enCurly(?:Double)?Quote|tionKey)|r|ver(?:Brac(?:e|ket)|Parenthesis))|P(?:a(?:geBreak(?:Above|Below)|r(?:agraph(?:Separator)?|tialD))|er(?:mutationProduct|pendicular)|hi|i(?:ecewise|scesSign)?|l(?:aceholder|u(?:sMinus|to))|r(?:ecedes(?:Equal|SlantEqual|Tilde)?|ime|o(?:babilityPr|duct|portion(?:al)?))|si)|QuarterNote|R(?:Hacek|awEscape|e(?:gisteredTrademark|turn(?:Indicator|Key)|verse(?:DoublePrime|E(?:lement|quilibrium)|Prime|UpEquilibrium))|ho|ight(?:A(?:ngle(?:Bracket)?|rrow(?:Bar|LeftArrow)?|ssociation)|BracketingBar|Ceiling|Do(?:ubleBracket(?:ingBar)?|wn(?:TeeVector|Vector(?:Bar)?))|Floor|Guillemet|Modified|Pointer|Skeleton|T(?:ee(?:Arrow|Vector)?|riangle(?:Bar|Equal)?)|Up(?:DownVector|TeeVector|Vector(?:Bar)?)|Vector(?:Bar)?)|ound(?:Implies|SpaceIndicator)|u(?:le(?:Delayed)?|pee))|S(?:Hacek|Z|a(?:dSmiley|gittariusSign|mpi|turn)|c(?:orpioSign|ript(?:[AB]|C(?:apital[A-Z])?|D(?:otless[IJ])?|E(?:ight)?|F(?:ive|our)?|[G-M]|N(?:ine)?|O(?:ne)?|[P-R]|S(?:even|ix)?|T(?:hree|wo)?|[U-Y]|Z(?:ero)?))|e(?:ction|lectionPlaceholder)|h(?:a(?:h|rp)|iftKey|ort(?:Down|Left|Right|Up)Arrow)|i(?:gma|xPointedStar)|keletonIndicator|mallCircle|p(?:a(?:ce(?:Indicator|Key)|deSuit|nFrom(?:Above|Both|Left))|hericalAngle|ooky)|q(?:rt|uare(?:Intersection|Su(?:bset(?:Equal)?|perset(?:Equal)?)|Union)?)|t(?:ar|e(?:pper(?:Down|Left|Right|Up)|rling)|igma)|u(?:bset(?:Equal)?|c(?:ceeds(?:Equal|SlantEqual|Tilde)?|hThat)|[mn]|perset(?:Equal)?)|ystem(?:EnterKe|sModelDela)y)|T(?:Hacek|a(?:bKey|u(?:rusSign)?)|ensor(?:Product|Wedge)|h(?:e(?:refore|ta)|i(?:ck|n)Space|orn)|i(?:lde(?:Equal|FullEqual|Tilde)?|mes)|r(?:a(?:demark|nspose)|ipleDot)|woWayRule)|U(?:Acute|Double(?:Acute|Dot)|Grave|Hat|Ring|n(?:d(?:er(?:Brac(?:e|ket)|Parenthesis)|irectedEdge)|ion(?:Plus)?|knownGlyph)|p(?:Arrow(?:Bar|DownArrow)?|DownArrow|Equilibrium|Pointer|Tee(?:Arrow)?|per(?:Lef|Righ)tArrow|silon)|ranus)|V(?:e(?:ctor(?:Greater(?:Equal)?|Less(?:Equal)?)|e|nus|r(?:tical(?:Bar|Ellipsis|Line|Separator|Tilde)|yThinSpace))|i(?:lla|rgoSign))|W(?:a(?:rningSig|tchIco)n|e(?:dge|ierstrassP)|hite(?:Bishop|K(?:ing|night)|Pawn|Queen|Rook)|olf(?:ram(?:AlphaPrompt|LanguageLogo(?:Circle)?))?)|X(?:i|nor|or)|Y(?:Acute|DoubleDot|en)|Z(?:Hacek|eta))\]
Options:  < none >
Completed iterations:   1  /  1     ( x 1 )
Matches found per iteration:   1013
Elapsed Time:    0.00 s,   1.42 ms,   1420 µs
Matches per sec:   713,380


Regex2:   \\\[(?:RawEscape|NonBreakingSpace|DownExclamation|Cent|Sterling|Currency|Yen|Section|DoubleDot|Copyright|LeftGuillemet|Not|DiscretionaryHyphen|RegisteredTrademark|Degree|PlusMinus|Micro|Paragraph|CenterDot|Cedilla|RightGuillemet|DownQuestion|CapitalAGrave|CapitalAAcute|CapitalAHat|CapitalATilde|CapitalADoubleDot|CapitalARing|CapitalAE|CapitalCCedilla|CapitalEGrave|CapitalEAcute|CapitalEHat|CapitalEDoubleDot|CapitalIGrave|CapitalIAcute|CapitalIHat|CapitalIDoubleDot|CapitalEth|CapitalNTilde|CapitalOGrave|CapitalOAcute|CapitalOHat|CapitalOTilde|CapitalODoubleDot|Times|CapitalOSlash|CapitalUGrave|CapitalUAcute|CapitalUHat|CapitalUDoubleDot|CapitalYAcute|CapitalThorn|SZ|AGrave|AAcute|AHat|ATilde|ADoubleDot|ARing|AE|CCedilla|EGrave|EAcute|EHat|EDoubleDot|IGrave|IAcute|IHat|IDoubleDot|Eth|NTilde|OGrave|OAcute|OHat|OTilde|ODoubleDot|Divide|OSlash|UGrave|UAcute|UHat|UDoubleDot|YAcute|Thorn|YDoubleDot|CapitalABar|ABar|CapitalACup|ACup|CapitalCAcute|CAcute|CapitalCHacek|CHacek|CapitalDHacek|DHacek|CapitalEBar|EBar|CapitalECup|ECup|CapitalEHacek|EHacek|CapitalICup|ICup|DotlessI|CapitalLSlash|LSlash|CapitalNHacek|NHacek|CapitalODoubleAcute|ODoubleAcute|CapitalOE|OE|CapitalRHacek|RHacek|CapitalSHacek|SHacek|CapitalTHacek|THacek|CapitalURing|URing|CapitalUDoubleAcute|UDoubleAcute|CapitalZHacek|ZHacek|Florin|Hacek|Breve|CapitalAlpha|CapitalBeta|CapitalGamma|CapitalDelta|CapitalEpsilon|CapitalZeta|CapitalEta|CapitalTheta|CapitalIota|CapitalKappa|CapitalLambda|CapitalMu|CapitalNu|CapitalXi|CapitalOmicron|CapitalPi|CapitalRho|CapitalSigma|CapitalTau|CapitalUpsilon|CapitalPhi|CapitalChi|CapitalPsi|CapitalOmega|Alpha|Beta|Gamma|Delta|CurlyEpsilon|Zeta|Eta|Theta|Iota|Kappa|Lambda|Mu|Nu|Xi|Omicron|Pi|Rho|FinalSigma|Sigma|Tau|Upsilon|CurlyPhi|Chi|Psi|Omega|CurlyTheta|CurlyCapitalUpsilon|Phi|CurlyPi|CapitalStigma|Stigma|CapitalDigamma|Digamma|CapitalKoppa|Koppa|CapitalSampi|Sampi|CurlyKappa|CurlyRho|Epsilon|ThickSpace|ThinSpace|VeryThinSpace|Hyphen|Dash|LongDash|OpenCurlyQuote|CloseCurlyQuote|OpenCurlyDoubleQuote|CloseCurlyDoubleQuote|Dagger|DoubleDagger|Bullet|Ellipsis|LineSeparator|ParagraphSeparator|Prime|DoublePrime|ReversePrime|ReverseDoublePrime|SkeletonIndicator|MediumSpace|NoBreak|InvisibleTimes|Euro|Rupee|ScriptG|ScriptCapitalH|GothicCapitalH|HBar|ScriptCapitalI|GothicCapitalI|ScriptCapitalL|ScriptL|WeierstrassP|ScriptCapitalR|GothicCapitalR|Trademark|Mho|GothicCapitalZ|Angstrom|ScriptCapitalB|GothicCapitalC|ScriptE|ScriptCapitalE|ScriptCapitalF|ScriptCapitalM|ScriptO|Aleph|Bet|Gimel|Dalet|LeftArrow|UpArrow|RightArrow|DownArrow|LeftRightArrow|UpDownArrow|UpperLeftArrow|UpperRightArrow|LowerRightArrow|LowerLeftArrow|LeftTeeArrow|UpTeeArrow|RightTeeArrow|DownTeeArrow|ReturnIndicator|LeftVector|DownLeftVector|RightUpVector|LeftUpVector|RightVector|DownRightVector|RightDownVector|LeftDownVector|RightArrowLeftArrow|UpArrowDownArrow|LeftArrowRightArrow|ReverseEquilibrium|Equilibrium|DoubleLeftArrow|DoubleUpArrow|DoubleRightArrow|DoubleDownArrow|DoubleLeftRightArrow|DoubleUpDownArrow|LeftArrowBar|RightArrowBar|DownArrowUpArrow|ForAll|PartialD|Exists|NotExists|EmptySet|Laplacian|Del|Element|NotElement|ReverseElement|NotReverseElement|SuchThat|Product|Coproduct|Sum|Minus|MinusPlus|DivisionSlash|Backslash|SmallCircle|Sqrt|Proportional|Infinity|RightAngle|Angle|MeasuredAngle|SphericalAngle|Divides|DoubleVerticalBar|NotDoubleVerticalBar|And|Or|Integral|ContourIntegral|DoubleContourIntegral|ClockwiseContourIntegral|CounterClockwiseContourIntegral|Therefore|Because|Colon|Proportion|Tilde|VerticalTilde|NotTilde|EqualTilde|TildeEqual|NotTildeEqual|TildeFullEqual|NotTildeFullEqual|TildeTilde|NotTildeTilde|CupCap|HumpDownHump|HumpEqual|DotEqual|NotEqual|Congruent|NotCongruent|LessEqual|GreaterEqual|LessFullEqual|GreaterFullEqual|NotLessFullEqual|NotGreaterFullEqual|LessLess|GreaterGreater|NotCupCap|NotLess|NotGreater|NotLessEqual|NotGreaterEqual|LessTilde|GreaterTilde|NotLessTilde|NotGreaterTilde|LessGreater|GreaterLess|NotLessGreater|NotGreaterLess|Precedes|Succeeds|PrecedesSlantEqual|SucceedsSlantEqual|PrecedesTilde|SucceedsTilde|NotPrecedes|NotSucceeds|Subset|Superset|NotSubset|NotSuperset|SubsetEqual|SupersetEqual|NotSubsetEqual|NotSupersetEqual|UnionPlus|SquareSubset|SquareSuperset|SquareSubsetEqual|SquareSupersetEqual|SquareIntersection|SquareUnion|CirclePlus|CircleMinus|CircleTimes|CircleDot|RightTee|LeftTee|DownTee|UpTee|DoubleRightTee|LeftTriangle|RightTriangle|LeftTriangleEqual|RightTriangleEqual|Xor|Nand|Nor|Wedge|Vee|Intersection|Union|Diamond|Star|LessEqualGreater|GreaterEqualLess|NotPrecedesSlantEqual|NotSucceedsSlantEqual|NotSquareSubsetEqual|NotSquareSupersetEqual|NotPrecedesTilde|NotSucceedsTilde|NotLeftTriangle|NotRightTriangle|NotLeftTriangleEqual|NotRightTriangleEqual|VerticalEllipsis|CenterEllipsis|AscendingEllipsis|DescendingEllipsis|Diameter|LeftCeiling|RightCeiling|LeftFloor|RightFloor|CloverLeaf|WatchIcon|Cap|Cup|LeftAngleBracket|RightAngleBracket|OverBracket|UnderBracket|SpaceIndicator|HorizontalLine|VerticalLine|FilledSquare|EmptySquare|FilledVerySmallSquare|EmptyVerySmallSquare|FilledRectangle|EmptyRectangle|FilledUpTriangle|EmptyUpTriangle|UpPointer|FilledRightTriangle|RightPointer|FilledDownTriangle|EmptyDownTriangle|DownPointer|FilledLeftTriangle|LeftPointer|FilledDiamond|EmptyDiamond|EmptyCircle|FilledCircle|EmptySmallCircle|EmptySmallSquare|FilledSmallSquare|FivePointedStar|Sun|CheckmarkedBox|CheckedBox|SadSmiley|HappySmiley|Moon|Mercury|Venus|Mars|Jupiter|Saturn|Neptune|Pluto|AriesSign|TaurusSign|GeminiSign|CancerSign|LeoSign|VirgoSign|LibraSign|ScorpioSign|SagittariusSign|CapricornSign|AquariusSign|PiscesSign|WhiteKing|WhiteQueen|WhiteRook|WhiteBishop|WhiteKnight|WhitePawn|BlackKing|BlackQueen|BlackRook|BlackBishop|BlackKnight|BlackPawn|SpadeSuit|HeartSuit|DiamondSuit|ClubSuit|QuarterNote|EighthNote|BeamedEighthNote|BeamedSixteenthNote|Flat|Natural|Sharp|Uranus|Checkmark|SixPointedStar|Perpendicular|LongLeftArrow|LongRightArrow|LongLeftRightArrow|DoubleLongLeftArrow|DoubleLongRightArrow|DoubleLongLeftRightArrow|UpArrowBar|DownArrowBar|LeftRightVector|RightUpDownVector|DownLeftRightVector|LeftUpDownVector|LeftVectorBar|RightVectorBar|RightUpVectorBar|RightDownVectorBar|DownLeftVectorBar|DownRightVectorBar|LeftUpVectorBar|LeftDownVectorBar|LeftTeeVector|RightTeeVector|RightUpTeeVector|RightDownTeeVector|DownLeftTeeVector|DownRightTeeVector|LeftUpTeeVector|LeftDownTeeVector|UpEquilibrium|ReverseUpEquilibrium|RoundImplies|LeftTriangleBar|RightTriangleBar|Equivalent|LessSlantEqual|GreaterSlantEqual|NestedLessLess|NestedGreaterGreater|PrecedesEqual|SucceedsEqual|DoubleLeftTee|LeftDoubleBracket|RightDoubleBracket|LeftAssociation|RightAssociation|Shah|WolframLanguageLogo|WolframLanguageLogoCircle|TwoWayRule|FreeformPrompt|WolframAlphaPrompt|InvisibleSpace|Piecewise|NegativeVeryThinSpace|NegativeThinSpace|NegativeMediumSpace|NegativeThickSpace|ImplicitPlus|Null|IndentingNewLine|AutoPlaceholder|AutoLeftMatch|AutoRightMatch|AutoSpace|AutoOperand|SystemsModelDelay|Continuation|RoundSpaceIndicator|InvisiblePrefixScriptBase|InvisiblePostfixScriptBase|EntityStart|EntityEnd|SpanFromLeft|SpanFromAbove|SpanFromBoth|PageBreakAbove|PageBreakBelow|DiscretionaryPageBreakAbove|DiscretionaryPageBreakBelow|Transpose|Conjugate|ConjugateTranspose|StepperRight|StepperLeft|StepperUp|StepperDown|HermitianConjugate|VerticalBar|NotVerticalBar|Distributed|Conditioned|UndirectedEdge|DirectedEdge|Gradient|Divergence|Curl|ContinuedFractionK|TensorProduct|TensorWedge|ProbabilityPr|ExpectationE|PermutationProduct|Earth|NotEqualTilde|NotHumpEqual|NotHumpDownHump|NotLeftTriangleBar|NotRightTriangleBar|NotLessLess|NotNestedLessLess|NotLessSlantEqual|NotGreaterGreater|NotNestedGreaterGreater|NotGreaterSlantEqual|NotPrecedesEqual|NotSucceedsEqual|NotSquareSubset|NotSquareSuperset|Equal|VerticalSeparator|VectorGreater|VectorGreaterEqual|VectorLess|VectorLessEqual|Limit|MaxLimit|MinLimit|Cross|Function|Xnor|DiscreteShift|DifferenceDelta|DiscreteRatio|InlinePart|RuleDelayed|Square|Rule|Implies|ShortRightArrow|ShortLeftArrow|SelectionPlaceholder|Placeholder|ShortUpArrow|ShortDownArrow|LeftBracketingBar|RightBracketingBar|LeftDoubleBracketingBar|RightDoubleBracketingBar|ScriptA|ScriptB|ScriptC|ScriptD|ScriptF|ScriptH|ScriptI|ScriptJ|ScriptK|ScriptM|ScriptN|ScriptP|ScriptQ|ScriptR|ScriptS|ScriptT|ScriptU|ScriptV|ScriptW|ScriptX|ScriptY|ScriptZ|GothicA|GothicB|GothicC|GothicD|GothicE|GothicF|GothicG|GothicH|GothicI|GothicJ|GothicK|GothicL|GothicM|GothicN|GothicO|GothicP|GothicQ|GothicR|GothicS|GothicT|GothicU|GothicV|GothicW|GothicX|GothicY|GothicZ|DoubleStruckA|DoubleStruckB|DoubleStruckC|DoubleStruckD|DoubleStruckE|DoubleStruckF|DoubleStruckG|DoubleStruckH|DoubleStruckI|DoubleStruckJ|DoubleStruckK|DoubleStruckL|DoubleStruckM|DoubleStruckN|DoubleStruckO|DoubleStruckP|DoubleStruckQ|DoubleStruckR|DoubleStruckS|DoubleStruckT|DoubleStruckU|DoubleStruckV|DoubleStruckW|DoubleStruckX|DoubleStruckY|DoubleStruckZ|DotlessJ|Wolf|FreakedSmiley|NeutralSmiley|LightBulb|NumberSign|WarningSign|Villa|Akuz|Andy|Spooky|ScriptDotlessI|ScriptDotlessJ|DoubledPi|DoubledGamma|CapitalDifferentialD|DifferentialD|ExponentialE|ImaginaryI|ImaginaryJ|FilledSmallCircle|DottedSquare|GraySquare|GrayCircle|LetterSpace|DownBreve|KernelIcon|MathematicaIcon|TripleDot|SystemEnterKey|AlignmentMarker|LeftSkeleton|RightSkeleton|ControlKey|AliasDelimiter|InvisibleComma|ReturnKey|ErrorIndicator|AliasIndicator|EscapeKey|CommandKey|LeftModified|RightModified|InvisibleApplication|DiscretionaryLineSeparator|DiscretionaryParagraphSeparator|ScriptCapitalA|ScriptCapitalC|ScriptCapitalD|ScriptCapitalG|ScriptCapitalJ|ScriptCapitalK|ScriptCapitalN|ScriptCapitalO|ScriptCapitalP|ScriptCapitalQ|ScriptCapitalS|ScriptCapitalT|ScriptCapitalU|ScriptCapitalV|ScriptCapitalW|ScriptCapitalX|ScriptCapitalY|ScriptCapitalZ|GothicCapitalA|GothicCapitalB|GothicCapitalD|GothicCapitalE|GothicCapitalF|GothicCapitalG|GothicCapitalJ|GothicCapitalK|GothicCapitalL|GothicCapitalM|GothicCapitalN|GothicCapitalO|GothicCapitalP|GothicCapitalQ|GothicCapitalS|GothicCapitalT|GothicCapitalU|GothicCapitalV|GothicCapitalW|GothicCapitalX|GothicCapitalY|DoubleStruckCapitalA|DoubleStruckCapitalB|DoubleStruckCapitalC|DoubleStruckCapitalD|DoubleStruckCapitalE|DoubleStruckCapitalF|DoubleStruckCapitalG|DoubleStruckCapitalH|DoubleStruckCapitalI|DoubleStruckCapitalJ|DoubleStruckCapitalK|DoubleStruckCapitalL|DoubleStruckCapitalM|DoubleStruckCapitalN|DoubleStruckCapitalO|DoubleStruckCapitalP|DoubleStruckCapitalQ|DoubleStruckCapitalR|DoubleStruckCapitalS|DoubleStruckCapitalT|DoubleStruckCapitalU|DoubleStruckCapitalV|DoubleStruckCapitalW|DoubleStruckCapitalX|DoubleStruckCapitalY|DoubleStruckCapitalZ|TabKey|SpaceKey|DeleteKey|AltKey|OptionKey|KeyBar|EnterKey|ShiftKey|Mod1Key|Mod2Key|LongEqual|ConstantC|DoubleStruckZero|DoubleStruckOne|DoubleStruckTwo|DoubleStruckThree|DoubleStruckFour|DoubleStruckFive|DoubleStruckSix|DoubleStruckSeven|DoubleStruckEight|DoubleStruckNine|GothicZero|GothicOne|GothicTwo|GothicThree|GothicFour|GothicFive|GothicSix|GothicSeven|GothicEight|GothicNine|ScriptZero|ScriptOne|ScriptTwo|ScriptThree|ScriptFour|ScriptFive|ScriptSix|ScriptSeven|ScriptEight|ScriptNine|FirstPage|LastPage|FormalA|FormalB|FormalC|FormalD|FormalE|FormalF|FormalG|FormalH|FormalI|FormalJ|FormalK|FormalL|FormalM|FormalN|FormalO|FormalP|FormalQ|FormalR|FormalS|FormalT|FormalU|FormalV|FormalW|FormalX|FormalY|FormalZ|FormalCapitalA|FormalCapitalB|FormalCapitalC|FormalCapitalD|FormalCapitalE|FormalCapitalF|FormalCapitalG|FormalCapitalH|FormalCapitalI|FormalCapitalJ|FormalCapitalK|FormalCapitalL|FormalCapitalM|FormalCapitalN|FormalCapitalO|FormalCapitalP|FormalCapitalQ|FormalCapitalR|FormalCapitalS|FormalCapitalT|FormalCapitalU|FormalCapitalV|FormalCapitalW|FormalCapitalX|FormalCapitalY|FormalCapitalZ|FormalCapitalAlpha|FormalCapitalBeta|FormalCapitalGamma|FormalCapitalDelta|FormalCapitalEpsilon|FormalCapitalZeta|FormalCapitalEta|FormalCapitalTheta|FormalCapitalIota|FormalCapitalKappa|FormalCapitalLambda|FormalCapitalMu|FormalCapitalNu|FormalCapitalXi|FormalCapitalOmicron|FormalCapitalPi|FormalCapitalRho|FormalCapitalSigma|FormalCapitalTau|FormalCapitalUpsilon|FormalCapitalPhi|FormalCapitalChi|FormalCapitalPsi|FormalCapitalOmega|FormalAlpha|FormalBeta|FormalGamma|FormalDelta|FormalCurlyEpsilon|FormalZeta|FormalEta|FormalTheta|FormalIota|FormalKappa|FormalLambda|FormalMu|FormalNu|FormalXi|FormalOmicron|FormalPi|FormalRho|FormalFinalSigma|FormalSigma|FormalTau|FormalUpsilon|FormalCurlyPhi|FormalChi|FormalPsi|FormalOmega|FormalCurlyTheta|FormalCurlyCapitalUpsilon|FormalPhi|FormalCurlyPi|FormalCapitalStigma|FormalStigma|FormalCapitalDigamma|FormalDigamma|FormalCapitalKoppa|FormalKoppa|FormalCapitalSampi|FormalSampi|FormalCurlyKappa|FormalCurlyRho|FormalEpsilon|FiLigature|FlLigature|OverParenthesis|UnderParenthesis|OverBrace|UnderBrace|UnknownGlyph)\]
Options:  < none >
Completed iterations:   1  /  1     ( x 1 )
Matches found per iteration:   1013
Elapsed Time:    0.01 s,   11.79 ms,   11787 µs
Matches per sec:   85,942
  • 3
    Alright, I see now that this is different from my Trie in some details as it also takes care of suffixes. Example: my version contains `udio(?:InputDevices|OutputDevices)` while yours uses `udio(?:In|Out)putDevices)` which probably further optimizes the structure. I will look tomorrow closer. However, this basically answers all my questions. – halirutan May 17 '19 at 01:45