11

Are there regular expression equivalents for searching and modifying tree structures? Concise mini-languages (like perl regex) are what I am looking for.

Here is an example that might clarify what I am looking for.

<root>
  <node name="1">
    subtrees ....
  </node>
  <node name="2">
    <node name="2.1">
     data
    </node>
    other subtrees...
  </node>
</root>

An operation that would be possible on the above tree is "move subtree at node 2.1 into the subtree at node 1." The result of the operation might look something like..

<root>
  <node name="1">
    subtrees ....
    <node name="2.1">
     data
    </node>
  </node>
  <node name="2">
    other subtrees...
  </node>
</root>

Search and replace operations like find all nodes with atleast 2 children, find all nodes whose data starts with "a" and replace it with "b" if the subtrees have atleast 2 other siblings, etc. should be supported.

For strings, where the only dimension is across the length of the string, we can do many of above operations (or their 1D equivalents) using regular expressions. I wonder if there are equivalents for trees. (instead of a single regex, you might need to write a set of transformation rules, but that is ok).

I would like to know if there is some simple mini language (not regex per.se, but something that is as accessible as regex via libraries, etc..). to perform these operations? Preferably, as a python library.

Coral Doe
  • 1,925
  • 3
  • 19
  • 36
JSN
  • 1,247
  • 1
  • 11
  • 12
  • Thinking about how the syntax of that thing could be... :) – Mehrdad Afshari May 17 '09 at 17:59
  • Mmh, can you be more explicit about what you have and what the regex should do? – akappa May 17 '09 at 18:02
  • This needs to be more specific - are you parsing XML or what? – Nathaniel Flath May 17 '09 at 18:23
  • 3
    You should not call it "regular", because "regular" needs "context-free", which trees are not. – Svante May 17 '09 at 22:30
  • I agree, calling it "regular" was not correct, however, my idea in making a comparison to regular languages was in terms of utility, I want to do search and replace in strings - I do regex, I want to do the same on trees, I use ..? – JSN May 18 '09 at 17:56
  • There are two issues with @Svante 's comment: First of all, the data that a regular expression is evaluated *on* does not have to be regular. E.g., you can use regular expressions to search for the use of specific n-grams in your favourite corpus of arbitrary complex language. And second, "regular expressions", as used within programming languages, is *not* the same as "regular expressions" that define the regular languages in the Chomsky hierarchy. See [this post](https://cstheory.stackexchange.com/a/37623) for an explanation. Oh, and "trees" can of course be "context-free". – Lutz Büch Dec 12 '19 at 13:37

7 Answers7

8

TSurgeon and Tregex from Stanford is capable of doing that. You can download the library from http://nlp.stanford.edu/software/tregex.shtml

Volkan Agun
  • 96
  • 1
  • 1
5

I don't know a general purpose langugae that can do that, but it seems to me that you are looking for something like XPath.

Daniel Brückner
  • 59,031
  • 16
  • 99
  • 143
  • 1
    I've looked at XPath. It seems promising but it doesn't seem to handle expressions over sets of nodes (eg, find all nodes who have atleast 2 siblings). It has limited functionality. – JSN May 17 '09 at 19:37
5

There is TXL for pattern based tree rewriting.

Tree rewriting with patterns is also done with parser toolkits such as ANTLR

Code generation with bottom up tree rewriting, google BURS or BURG.

Doug Currie
  • 40,708
  • 1
  • 95
  • 119
  • 1
    TXL seems very promising, however both ANTLR and TXL assume a context free grammar, which is important when you also need to perform parsing. However, for the purposes of transformation and regex like behavior on trees it should explicitly be context dependent. See my clarification of the question above for some use cases that I would like (eg: search with conditions on siblings). – JSN May 17 '09 at 19:46
1

Navigating through a binary search tree requires state (in which node I am?) and comparisons (is that value less or greater than that?), things that cannot be done by a finite state automaton.

Sure, you can search for the node with a given value, but how then you could, for example, remove a node that isn't a leaf if you don't know its parent?

And even if you know the parent via the information supplied by the node, how do you determine the minimum of the left subtree, remove it and place it in the node?

I think you're asking too much to FSA.

akappa
  • 10,220
  • 3
  • 39
  • 56
  • The automaton could work if each node contains the relevant data (and states relating to that) for all data that might be matched such as ancestry and parent-state? – Aiden Bell May 17 '09 at 18:17
  • -- continuation -- Then subexpressions relating to other nodes can invoke a sub-engine to return a state or boolean mapped to a transition. – Aiden Bell May 17 '09 at 18:18
1

This article gives some tasty hints about recursive Perl regular expressions, but honestly it's rare to see a tree structure approached this way.

More typically, one would write a state machine style parser, that might use regexes to parse each particular node in the tree.

Expat is probably a good example to look at.

Sean Cavanagh
  • 4,727
  • 2
  • 19
  • 14
1

Pattern Matching, provided by languages such as Scala, F#, Erlang and Haskell (I'm sure there's more) is designed to succinctly manipulate data structures like trees, esp when used with recursion.

here is a very high level view of what pattren matching can do in Scala. The examples shown really don't do pattern matching justice.

Wikipedia has a couple of references to pattern matching, too. Here and here.

Barry Carr
  • 432
  • 4
  • 11
1

I'm somewhat surprised that XSLT hasn't come up as an answer. Granted, I don't think it's a particularly elegant language, and most existent solutions tend to favour procedural approaches rather than pattern matching, and it's gotten a mighty bad rep from being blindly applied just because it's XML being applied to XML -- but otherwise it fits the bill. Pity its canonical representation is so verbose, though...

Community
  • 1
  • 1
Pontus Gagge
  • 17,166
  • 1
  • 38
  • 51
  • Right now, XSLT seems to be the closest to what I want, but writing context sensitive queries seems convoluted, my question was to find something better than xslt. – JSN May 18 '09 at 18:03