6

I am experimenting with syntax mods in Mathematica, using the Notation package.

I am not interested in mathematical notation for a specific field, but general purpose syntax modifications and extensions, especially notations that reduce the verbosity of Mathematica's VeryLongFunctionNames, clean up unwieldy constructs, or extend the language in a pleasing way.

An example modification is defining Fold[f, x] to evaluate as Fold[f, First@x, Rest@x]
This works well, and is quite convenient.

Another would be defining *{1,2} to evaluate as Sequence @@ {1,2} as inspired by Python; this may or may not work in Mathematica.

Please provide information or links addressing:

  • Limits of notation and syntax modification

  • Tips and tricks for implementation

  • Existing packages, examples or experiments

  • Why this is a good or bad idea

Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125

3 Answers3

5

Not a really constructive answer, just a couple of thoughts. First, a disclaimer - I don't suggest any of the methods described below as good practices (perhaps generally they are not), they are just some possibilities which seem to address your specific question. Regarding the stated goal - I support the idea very much, being able to reduce verbosity is great (for personal needs of a solo developer, at least). As for the tools: I have very little experience with Notation package, but, whether or not one uses it or writes some custom box-manipulation preprocessor, my feeling is that the whole fact that the input expression must be parsed into boxes by Mathematica parser severely limits a number of things that can be done. Additionally, there will likely be difficulties with using it in packages, as was mentioned in the other reply already.

It would be easiest if there would be some hook like $PreRead, which would allow the user to intercept the input string and process it into another string before it is fed to the parser. That would allow one to write a custom preprocessor which operates on the string level - or you can call it a compiler if you wish - which will take a string of whatever syntax you design and generate Mathematica code from it. I am not aware of such hook (it may be my ignorance of course). Lacking that, one can use for example the program style cells and perhaps program some buttons which read the string from those cells and call such preprocessor to generate the Mathematica code and paste it into the cell next to the one where the original code is.

Such preprocessor approach would work best if the language you want is some simple language (in terms of its syntax and grammar, at least), so that it is easy to lexically analyze and parse. If you want the Mathematica language (with its full syntax modulo just a few elements that you want to change), in this approach you are out of luck in the sense that, regardless of how few and "lightweight" your changes are, you'd need to re-implement pretty much completely the Mathematica parser, just to make those changes, if you want them to work reliably. In other words, what I am saying is that IMO it is much easier to write a preprocessor that would generate Mathematica code from some Lisp-like language with little or no syntax, than try to implement a few syntactic modifications to otherwise the standard mma.

Technically, one way to write such a preprocessor is to use standard tools like Lex(Flex) and Yacc(Bison) to define your grammar and generate the parser (say in C). Such parser can be plugged back to Mathematica either through MathLink or LibraryLink (in the case of C). Its end result would be a string, which, when parsed, would become a valid Mathematica expression. This expression would represent the abstract syntax tree of your parsed code. For example, code like this (new syntax for Fold is introduced here)

"((1|+|{2,3,4,5}))"

could be parsed into something like

"functionCall[fold,{plus,1,{2,3,4,5}}]"

The second component for such a preprocessor would be written in Mathematica, perhaps in a rule-based style, to generate Mathematica code from the AST. The resulting code must be somehow held unevaluated. For the above code, the result might look like

Hold[Fold[Plus,1,{2,3,4,5}]]

It would be best if analogs of tools like Lex(Flex)/Yacc(Bison) were available within Mathematica ( I mean bindings, which would require one to only write code in Mathematica, and generate say C parser from that automatically, plugging it back to the kernel either through MathLink or LibraryLink). I may only hope that they will become available in some future versions. Lacking that, the approach I described would require a lot of low-level work (C, or Java if your prefer). I think it is still doable however. If you can write C (or Java), you may try to do some fairly simple (in terms of the syntax / grammar) language - this may be an interesting project and will give an idea of what it will be like for a more complex one. I'd start with a very basic calculator example, and perhaps change the standard arithmetic operators there to some more weird ones that Mathematica can not parse properly itself, to make it more interesting. To avoid MathLink / LibraryLink complexity at first and just test, you can call the resulting executable from Mathematica with Run, passing the code as one of the command line arguments, and write the result to a temporary file, that you will then import into Mathematica. For the calculator example, the entire thing can be done in a few hours.

Of course, if you only want to abbreviate certain long function names, there is a much simpler alternative - you can use With to do that. Here is a practical example of that - my port of Peter Norvig's spelling corrector, where I cheated in this way to reduce the line count:

Clear[makeCorrector];
makeCorrector[corrector_Symbol, trainingText_String] :=
Module[{model, listOr, keys, words, edits1, train, max, known, knownEdits2},
(* Proxies for some commands - just to play with syntax a bit*)
With[{fn = Function, join = StringJoin, lower = ToLowerCase, 
 rev = Reverse, smatches = StringCases, seq = Sequence, chars = Characters, 
 inter = Intersection, dv = DownValues, len = Length, ins = Insert,
 flat = Flatten, clr = Clear, rep = ReplacePart, hp = HoldPattern},
(* body *)
listOr = fn[Null, Scan[If[# =!= {}, Return[#]] &, Hold[##]], HoldAll];
keys[hash_] := keys[hash] = Union[Most[dv[hash][[All, 1, 1, 1]]]];
words[text_] := lower[smatches[text, LetterCharacter ..]];
With[{m = model}, 
 train[feats_] := (clr[m]; m[_] = 1; m[#]++ & /@ feats; m)];
 With[{nwords = train[words[trainingText]], 
  alphabet = CharacterRange["a", "z"]},
  edits1[word_] := With[{c = chars[word]}, join @@@ Join[
     Table[
      rep[c, c, #, rev[#]] &@{{i}, {i + 1}}, {i, len[c] - 1}], 
     Table[Delete[c, i], {i, len[c]}], 
     flat[Outer[#1[c, ##2] &, {ins[#1, #2, #3 + 1] &, rep}, 
       alphabet, Range[len[c]], 1], 2]]];
  max[set_] := Sort[Map[{nwords[#], #} &, set]][[-1, -1]];
  known[words_] := inter[words, keys[nwords]]]; 
 knownEdits2[word_] := known[flat[Nest[Map[edits1, #, {-1}] &, word, 2]]];
 corrector[word_] := max[listOr[known[{word}], known[edits1[word]],
   knownEdits2[word], {word}]];]];

You need some training text with a large number of words as a string to pass as a second argument, and the first argument is the function name for a corrector. Here is the one that Norvig used:

text = Import["http://norvig.com/big.txt", "Text"];

You call it once, say

In[7]:= makeCorrector[correct, text]

And then use it any number of times on some words

In[8]:= correct["coputer"] // Timing

Out[8]= {0.125, "computer"}

You can make your custom With-like control structure, where you hard-code the short names for some long mma names that annoy you the most, and then wrap that around your piece of code ( you'll lose the code highlighting however). Note, that I don't generally advocate this method - I did it just for fun and to reduce the line count a bit. But at least, this is universal in the sense that it will work both interactively and in packages. Can not do infix operators, can not change precedences, etc, etc, but almost zero work.

Leonid Shifrin
  • 22,449
  • 4
  • 68
  • 100
  • +1 In the context of the notebook interface, the `CellEvaluationFunction` hook can be useful for this kind of processing. See [one of my answers on this topic](http://stackoverflow.com/questions/4198961/what-is-in-your-mathematica-tool-bag/5451304#5451304). – WReach Mar 27 '11 at 18:27
  • Someone gave me a link to a discussion about the `Notation`/Package problem, but I have misplaced it. While I am looking for it, do you know if anyone has tried a workaround of loading the package as a string and processing that with the FrontEnd? – Mr.Wizard Mar 27 '11 at 19:22
  • Here is a link on a related matter which you may find interesting: http://groups.google.com/group/comp.soft-sys.math.mathematica/browse_thread/thread/c000439b48751078. Regarding loading as a string - it may work, but there may also be complications because of the way the package parsing works. I was able to make it work for my specific purposes for `.mt` packages representing Mathematica unit tests. One of the problems may be package imports - explicit or hidden calls to `Needs`. I also have a feeling that processing packages with FrontEnd as a general method is conceptually wrong. – Leonid Shifrin Mar 27 '11 at 19:56
  • @WReach Thanks a lot for the link, it will make my life much easier. For me, this was a missing piece. – Leonid Shifrin Mar 27 '11 at 20:37
3

Not a full answer, but just to show a trick I learned here (more related to symbol redefinition than to Notation, I reckon):

Unprotect[Fold];
Fold[f_, x_] :=
  Block[{$inMsg = True, result},
    result = Fold[f, First@x, Rest@x];
    result] /; ! TrueQ[$inMsg];
Protect[Fold];

Fold[f, {a, b, c, d}]
(*
--> f[f[f[a, b], c], d]
*)

Edit

Thanks to @rcollyer for the following (see comments below).

You can switch the definition on or off as you please by using the $inMsg variable:

$inMsg = False;
Fold[f, {a, b, c, d}]
(*
->f[f[f[a,b],c],d]
*)

$inMsg = True;
Fold[f, {a, b, c, d}]
(*
->Fold::argrx: (Fold called with 2 arguments; 3 arguments are expected. 
*)

Fold[f, {a, b, c, d}]

That's invaluable while testing

Community
  • 1
  • 1
Dr. belisarius
  • 60,527
  • 15
  • 115
  • 190
  • 2
    +1, this is one form of [Execute Around Block](http://weblog.jamisbuck.org/2007/1/19/blocks-rock), with variants under other names: [RAII](http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Resource_Acquisition_Is_Initialization) and [Execute Around Pointer](http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Execute-Around_Pointer). My [OpenAndRead](http://stackoverflow.com/questions/4174791/preventing-avalanche-of-runtime-errors-in-mathematica/4176381#4176381) method does the same thing for file streams. – rcollyer Mar 26 '11 at 02:21
  • +1 for still using the magic variable `$inMsg` - which you really should rename for each of these modifications. Otherwise they will interfere with each other. – Simon Mar 26 '11 at 03:15
  • @Simon I'm still learning its true meaning. Seems magic, truly :) – Dr. belisarius Mar 26 '11 at 03:20
  • 1
    @Simon renaming doesn't seem to be necessary (other than for reasons of readability) as $inMsg never gets defined outside the Block scope. – Sjoerd C. de Vries Mar 26 '11 at 11:49
  • @Sjoerd: Oh yeah... my bad. I wasn't thinking quite right about the `Block` scoping construct. Sorry for the stupid comment belisarius! – Simon Mar 26 '11 at 12:36
  • @belisarius, I think I understand it. The key is the condition on `$inMsg`. During the initial call, `$inMsg` is not defined, and hence false. So, your version of `Fold` is run, as per the condition, but once inside the `Block`, `$inMsg` is defined and true. Therefor, when `Fold` is called inside, the condition is not met, so mma defaults back to the built in `Fold`. Quite a clever construct, actually. – rcollyer Mar 26 '11 at 14:30
  • 1
    @rcollyer Ahhh so that's why it uses Block[ ] and not Module[ ] ... to refer to the same var :D – Dr. belisarius Mar 26 '11 at 15:02
  • @belisarius, since `$inMsg` is used outside of the `Block` for the initial test, you'd better hope that someone has not set it, but it would be a convenient way to turn off your added functionality. – rcollyer Mar 26 '11 at 17:03
  • @rcollyer And ... how to "undo" the definition once it was done? (while testing that seems very useful) – Dr. belisarius Mar 26 '11 at 17:39
  • @belisarius, turn off your functionality: `$inMsg = True`, turn back on: `$inMsg = False`. The second one will allow the condition `! TrueQ[$inMsg]` to trigger. Or, more simply: `Clear[$inMsg]`, which would have the same effect. – rcollyer Mar 26 '11 at 17:55
  • @rcollyer It works OK. Now I think I understand the subtlety. Thanks! – Dr. belisarius Mar 26 '11 at 18:06
  • 2
    @belisarius, @rcollyer I'd still wrap `Module[{inFunction}, ...]` around everything, like `Module[{inFunction},f[x_]:=Block[{inFunction = True}, codeBefore;f[x];codeAfter]/;!TrueQ[inFunction]]`. Because if you don't, and you use this trick for more than one function and forget to use different `$inMsg` symbols (or they just happen to accidentally have been defined globally), you are in for *very* subtle bugs, and I could only wish happy debugging then. Using `Module` at definition-time will guarantee that the symbol is unique for each function and also not easily available globally. – Leonid Shifrin Mar 26 '11 at 20:15
  • @Leonid I understand these little beasts are impossible to debug, but your suggestion seems to go against the possibility of switching the redefinition on and off, which is a big gain, I think. – Dr. belisarius Mar 26 '11 at 20:24
  • 1
    @belisarius One thing you can do is to define three functions - `redef` and `redefOn`,`redefOff`, and make them share a local symbol. Calling `redef[f,codeBefore,codeAfter]` (`redef` must then be `HoldRest` or `HoldAll`) would generate the modified definition for `f` as above, and also two more (all inside the external `Module`): `redefOn[f]:=inFunction = True; redefOff[f]:=inFunction = False`. In this way, you expose the `Module`- generated trigger variable to the top-level in a controlled manner. – Leonid Shifrin Mar 26 '11 at 20:46
  • @Leonid Thanks. Nice idea. If I come to some usable code I will post it. – Dr. belisarius Mar 26 '11 at 20:51
3

(my first reply/post.... be gentle)

From my experience, the functionality appears to be a bit of a programming cul-de-sac. The ability to define custom notations seems heavily dependent on using the 'notation palette' to define and clear each custom notation. ('everything is an expression'... well, except for some obscure cases, like Notations, where you have to use a palette.) Bummer.

The Notation package documentation mentions this explicitly, so I can't complain too much.

If you just want to define custom notations in a particular notebook, Notations might be useful to you. On the other hand, if your goal is to implement custom notations in YourOwnPackage.m and distribute them to others, you are likely to encounter issues. (unless you're extremely fluent in Box structures?)

If someone can correct my ignorance on this, you'd make my month!! :)

(I was hoping to use Notations to force MMA to treat subscripted variables as symbols.)

telefunkenvf14
  • 1,011
  • 7
  • 19
  • 1
    Welcome to StackOverflow. Thank you for relating your experience. You are not the first person to suggest that there are issues with `Notation` definitions and packages. Until now I have limited my modifications to things that do not require `Notation` or `$PreRead` (such as a two-argument `Fold`), but I want to explore the capabilities and limitations of these methods. – Mr.Wizard Mar 26 '11 at 13:05
  • @telefunkenvf14 Hi! I think I encountered you on the mathgroup a couple of times, right? Welcome here as well. – Sjoerd C. de Vries Mar 26 '11 at 19:19
  • @telefunkenvf14 Ha! your nickname reminds me of ol' time audio hacking here. Welcome. – Dr. belisarius Mar 26 '11 at 20:30
  • 1
    @telefunkenvf14 You can force MMA to treat subscripted variables as symbols thus: `Needs["Notation``"]; Symbolize[ ParsedBoxWrapper[SubscriptBox["_", "_"]]]`. You must use the `Notation` package but you need not use the palette. (That's one backtick in the `Needs` -- cannot figure out the markdown). – WReach Mar 27 '11 at 20:31
  • @WReach: If you need a backtick in the code, use a double backtick mark the code: i.e. ``Needs["Notation`"]``. (Of course, what happens if you need a backtick and a double backtick?) – Simon Mar 30 '11 at 03:56
  • @Simon Ahhh, thanks for that. I tried all kinds of combinations of backticks _within_ the markdown (trying to beat the five minute countdown timer for comments), but never thought to try changing the outer backticks themselves. – WReach Mar 30 '11 at 05:11
  • @Sjoerd Yep, I think I'm going to just contribute on Stack from now on. Better questions, answers and incentives. What's not to love? – telefunkenvf14 Apr 02 '11 at 11:33
  • @belisarius - I'm actually in economics... but secretly wish I was an electrical engineer. – telefunkenvf14 Apr 02 '11 at 11:36