7

I need to do some metaprogramming on a large Mathematica code base (hundreds of thousands of lines of code) and don't want to have to write a full-blown parser so I was wondering how best to get the code from a Mathematica notebook out in an easily-parsed syntax.

Is it possible to export a Mathematica notebook in FullForm syntax, or to save all definitions in FullForm syntax?

The documentation for Save says that it can only export in the InputForm syntax, which is non-trivial to parse.

The best solution I have so far is to evaluate the notebook and then use DownValues to extract the rewrite rules with arguments (but this misses symbol definitions) as follows:

DVs[_] := {}
DVs[s_Symbol] := DownValues[s]
stream = OpenWrite["FullForm.m"];
WriteString[stream, 
  DVs[Symbol[#]] & /@ Names["Global`*"] // Flatten // FullForm];
Close[stream];

I've tried a variety of approaches so far but none are working well. Metaprogramming in Mathematica seems to be extremely difficult because it keeps evaluating things that I want to keep unevaluated. For example, I wanted to get the string name of the infinity symbol using SymbolName[Infinity] but the Infinity gets evaluated into a non-symbol and the call to SymbolName dies with an error. Hence my desire to do the metaprogramming in a more suitable language.

EDIT

The best solution seems to be to save the notebooks as package (.m) files by hand and then translate them using the following code:

stream = OpenWrite["EverythingFullForm.m"];
WriteString[stream, Import["Everything.m", "HeldExpressions"] // FullForm];
Close[stream];
Charles
  • 50,943
  • 13
  • 104
  • 142
J D
  • 48,105
  • 13
  • 171
  • 274

2 Answers2

6

You can certainly do this. Here is one way:

exportCode[fname_String] := 
 Function[code, 
    Export[fname, ToString@HoldForm@FullForm@code, "String"], 
    HoldAllComplete]

For example:

fn = exportCode["C:\\Temp\\mmacode.m"];
fn[
  Clear[getWordsIndices];
  getWordsIndices[sym_, words : {__String}] := 
      Developer`ToPackedArray[words /. sym["Direct"]];
];

And importing this as a string:

In[623]:= Import["C:\\Temp\\mmacode.m","String"]//InputForm
Out[623]//InputForm=
"CompoundExpression[Clear[getWordsIndices], SetDelayed[getWordsIndices[Pattern[sym, Blank[]], \
Pattern[words, List[BlankSequence[String]]]], Developer`ToPackedArray[ReplaceAll[words, \
sym[\"Direct\"]]]], Null]"

However, going to other language to do metaprogramming for Mathematica sounds ridiculous to me, given that Mathematica is very well suited for that. There are many techniques available in Mathematica to do meta-programming and avoid premature evaluation. One that comes to my mind I described in this answer, but there are many others. Since you can operate on parsed code and use the pattern-matching in Mathematica, you save a lot. You can browse the SO Mathematica tags (past questions) and find lots of examples of meta-programming and evaluation control.

EDIT

To ease your pain with auto-evaluating symbols (there are only a few actually, Infinity being one of them).If you just need to get a symbol name for a given symbol, then this function will help:

unevaluatedSymbolName =  Function[sym, SymbolName@Unevaluated@sym, HoldAllComplete]

You use it as

In[638]:= unevaluatedSymbolName[Infinity]//InputForm
Out[638]//InputForm="Infinity"

Alternatively, you can simply add HoldFirst attribute to SymbolName function via SetAttributes. One way is to do that globally:

SetAttributes[SymbolName,HoldFirst]; SymbolName[Infinity]//InputForm

Modifying built-in functions globally is however dangerous since it may have unpredictable effects for such a large system as Mathematica:

ClearAttributes[SymbolName, HoldFirst];

Here is a macro to use that locally:

ClearAll[withUnevaluatedSymbolName];
SetAttributes[withUnevaluatedSymbolName, HoldFirst];
withUnevaluatedSymbolName[code_] :=
  Internal`InheritedBlock[{SymbolName},
     SetAttributes[SymbolName, HoldFirst];
     code]

Now,

In[649]:= 
withUnevaluatedSymbolName[
   {#,StringLength[#]}&[SymbolName[Infinity]]]//InputForm

Out[649]//InputForm=  {"Infinity", 8}

You may also wish to do some replacements in a piece of code, say, replace a given symbol by its name. Here is an example code (which I wrap in Hold to prevent it from evaluation):

c = Hold[Integrate[Exp[-x^2], {x, -Infinity, Infinity}]]

The general way to do replacements in such cases is using Hold-attributes (see this answer) and replacements inside held expressions (see this question). For the case at hand:

In[652]:= 
withUnevaluatedSymbolName[
       c/.HoldPattern[Infinity]:>RuleCondition[SymbolName[Infinity],True]
]//InputForm

Out[652]//InputForm=
Hold[Integrate[Exp[-x^2], {x, -"Infinity", "Infinity"}]]

, although this is not the only way to do this. Instead of using the above macro, we can also encode the modification to SymbolName into the rule itself (here I am using a more wordy form ( Trott - Strzebonski trick) of in-place evaluation, but you can use RuleCondition as well:

ClearAll[replaceSymbolUnevaluatedRule];
SetAttributes[replaceSymbolUnevaluatedRule, HoldFirst];
replaceSymbolUnevaluatedRule[sym_Symbol] :=
  HoldPattern[sym] :> With[{eval = SymbolName@Unevaluated@sym}, eval /; True];

Now, for example:

In[629]:= 
Hold[Integrate[Exp[-x^2],{x,-Infinity,Infinity}]]/.
      replaceSymbolUnevaluatedRule[Infinity]//InputForm
Out[629]//InputForm=
    Hold[Integrate[Exp[-x^2], {x, -"Infinity", "Infinity"}]]

Actually, this entire answer is a good demonstration of various meta-programming techniques. From my own experiences, I can direct you to this, this, this, this and this answers of mine, where meta-programming was essential to solve problem I was addressing. You can also judge by the fraction of functions in Mathematica carrying Hold-attributes to all functions - it is about 10-15 percents if memory serves me well. All those functions are effectively macros, operating on code. To me, this is a very indicative fact, telling me that Mathematica jeavily builds on its meta-programming facilities.

Community
  • 1
  • 1
Leonid Shifrin
  • 22,449
  • 4
  • 68
  • 100
  • @Jon Harrop In fact, I think that from a third to a half of my posts on Mathematica SO tag use one or another (often several) forms of meta-programming to achieve their goals, and so do many other people here. While I agree that infinite evaluation model and a rather complex evaluator make it harder to do meta-programming, it is not only entirely possible but is quite routinely used. For a really short example of Mathematica meta-programming, see this recent question http://stackoverflow.com/questions/8240943/data-table-manipulation-in-mathematica-step-2/ and those it refers to. – Leonid Shifrin Nov 26 '11 at 22:21
  • Aargh, my MMA trial doesn't support `Export` so I cannot run your code (am OOF)! Playing with your code, it seems to use `HoldAllComplete` and `HoldForm` to prevent the evaluation of the function parameters and argument to `ToString` in order to convert a given block of code into a string in `FullForm` syntax. That's great but how do I apply it to several existing notebooks? – J D Nov 26 '11 at 23:44
  • One of the problems I wanted to solve is to write Mathematica code to create the dependency graph for the definitions in a notebook. How would you do that? For example, given `a=3; f[x_]:=a+x` you would get `{f->a}` and could then do `PlotGraph` to visualize it. – J D Nov 26 '11 at 23:45
  • @Jon Harrop I've actually done that several times in variations (dependencies). You can post it as a separate question and I'll try to dig out the code. You can start very simple, but the proper treatment of local variables etc is a harder task. You can look at my package here: mathprogramming-intro.org/download/packages/…, which finds inter-package function dependencies. David Wagner published his dependency analysis code in Mathematica journal and also in his book, "Power programming with Mathematica: the kernel". I think his treatment is a good starting point. – Leonid Shifrin Nov 27 '11 at 00:11
  • Thanks for the references. Regarding the applicability of Mathematica, I think the answers here really demonstrate just how hard Mathematica is making this easy problem. I've got problems with the Mathematica kernel dying silently for no known reason, problems with Mathematica failing to parse its own files, problems with it evaluating expressions when I don't want it to and so on. I'm going to have to solve much harder problems than this to get the job done and I've been struggling for days to do trivial metaprogramming with Mathematica... – J D Nov 27 '11 at 01:20
  • @Jon Harrop Well, I use M a lot and while I agree that things you mentioned happen, they don't happen very often in my work. From the languages I use (C, Java, Javascript and M), M has by far the best meta-programming facilities. I can imagine that in LISP or ML - families of languages certain things are much simpler. One approach to meta-programming which deals with the unwanted evaluation problem was used in `SymbolicC` functionality present in M8, and heavily used in `CUDALink` and `OpenCLLink` implementations - use completely inert heads to represent language constructs. – Leonid Shifrin Nov 27 '11 at 01:30
  • I just keep hitting strange problems when using Mathematica like the one described in the comments on the answer below here. The author has asked it as a question: http://stackoverflow.com/questions/8283256/problems-interpreting-input-cell-box-expressions – J D Nov 27 '11 at 03:10
5

The full forms of expressions can be extracted from the Code and Input cells of a notebook as follows:

$exprs =    
  Cases[
    Import["mynotebook.nb", "Notebook"]
  , Cell[content_, "Code"|"Input", ___] :>
      ToExpression[content, StandardForm, HoldComplete]
  , Infinity
  ] //
  Flatten[HoldComplete @@ #, 1, HoldComplete] & //
  FullForm

$exprs is assigned the expressions read, wrapped in Hold to prevent evaluation. $exprs could then be saved into a text file:

Export["myfile.txt", ToString[$exprs]]

Package files (.m) are slightly easier to read in this way:

Import["mypackage.m", "HeldExpressions"] //
Flatten[HoldComplete @@ #, 1, HoldComplete] &
WReach
  • 18,098
  • 3
  • 49
  • 93
  • Looks fantastic but doesn't work. Firstly, a `Cell` is often followed by other stuff so I had to add an extra `___` after the `"Code"|"Input"` but now I get `\[LeftSkeleton]707\[RightSkeleton]` in the output because it had been truncated and lots of errors including `Syntax::stresc: Unknown string escape \T.`. Any ideas? – J D Nov 27 '11 at 00:18
  • As for the `Code[...]` pattern, I neglected to account for cell options. Oops. Fixed as you suggest. As for the skeleton characters, it appears that some expressions are being written in `Short` form. I'm not sure why. If you change the `ToString` expression to `ToString[$exprs, StandardForm]`, does that help? – WReach Nov 27 '11 at 00:37
  • @WReach: That's just it, there are no calls to `ToString`! I'm just trying to evaluate the first expression (defining `$exprs`) and haven't even got to that line yet! I don't understand why anything would have been truncated. On another notebook I get errors including `ToExpression::esntx: "Could not parse \!\(BoxData[RowBox[{\"Find\", \"\[TripleDot]\", RowBox[{RowBox[{RowBox[<<1>>], \":=\", \" \", RowBox[<<1>>]}], \";\"}]}]]\) as Mathematica input."`. – J D Nov 27 '11 at 01:17
  • Ah, unless this truncated stuff is from an output cell in the notebook itself. But, even then, why would Mathematica's parser die on syntax generated by Mathematica itself?! – J D Nov 27 '11 at 01:24
  • 2
    I can reproduce your problems with some of my notebooks. I'm stumped. I've taken it to the community: [Problems interpreting input cell box expressions](http://stackoverflow.com/q/8283256/211232) – WReach Nov 27 '11 at 02:42
  • Meanwhile, saving the notebooks as package (.m) files might avoid that problem but how do I read the expressions in without evaluating them? Mathematica's `Import` doesn't support its own package file format. The `ReadList` function always seems to evaluate its result. Loading the file as a string and applying `ToExpression` once gives incorrect output because it fails to handle compound expressions separated by newlines... – J D Nov 27 '11 at 03:24
  • The `Import` command shown at the bottom of my response ought to work for ".m" files -- what happens? In another comment you said that your trial version of Mma does not support `Export` -- perhaps it does not fully support `Import` either? – WReach Nov 27 '11 at 03:35
  • Next problem: the (undocumented?!) `HeldExpressions` format for `Import` is not supported on Mathematica 5.2 and newer versions broke backward compatibility with `LongDash` and, consequently, cannot handle these notebooks. :-( – J D Jan 10 '12 at 14:44