57

The mathematica.SE is currently in private beta and will open to the public in a few days. Stack Overflow and related sites use prettify.js, however Mathematica is not a supported language. It would be pretty awesome to have a custom highlighting script for our site, and I request the JavaScript and CSS community's help in developing a such a script and the accompanying CSS.

I've listed below a few basic requirements such that it captures most of the features of Mathematica's default highlighting scheme (ignoring stuff that only the internal parser would know). I've also named the colours generically – hexadecimal colour codes can be picked from the screenshots I've provided (further below). I've also added code samples to accompany the screenshots so that folks can test it out.

Basic requirements

  1. Comments
    These are entered as (* comment *). So anything between these should be highlighted in gray.

  2. Strings
    These are entered as "string"(single quotes are not supported), and should be highlighted in pink.

  3. Operators/short hand notations
    Apart from the standard +, -, *, /, ^, ==, etc., Mathematica has several other operators and short hand notations. The most commonly encountered ones are:

    @, @@, @@@, /@, //@, //, ~, /., //., ->, :>, /:, /;, :=, :^=, =., 
    &, |, ||, &&, _, __, ___, ;;, [[, ]], <<, >>, ~~, <>
    

    These, and parenthesis, brackets and braces should all be highlighted in black.

  4. Patterns objects and slots
    Pattern objects start with a letter and have either _ or __ or ___ attached, like for example, x_, x__ and x___. These can also have additional letters following the underscore, as x_abc, etc. All of these should be highlighted in green.

    Slots are # and ## and can also be followed by an integer as #1, ##4, etc., and should also be in green.

    Both of these (pattern objects and slots) are usually terminated by an operator/bracket/shortform from point 3 above.

  5. Functions/variables
    Functions and variables is a rather loose terminology here, but serves for the purposes of this post. Anything not falling in the above 4 can be highlighted in black. Mathematica often uses backticks ` in code and should be considered part of the function/variable name. E.g., abcd`defg. Dollar signs $ anywhere in a variable name is to be treated just like a letter (i.e., nothing special).

For all of the above, if they appear inside strings, they should be treated as such, i.e. "@~# should be highlighted in pink.

Additional nice to haves:

  1. In the pattern objects in point 3 above, if the underscore(s) is followed by a ? and then some letters, then the part following the _ should be in black. E.g., in x__?abc, the x__ part must be in green and the ?abc in black.
  2. if a function/variable starts with a capital letter, then it is highlighted in black. If it starts with a small letter, it is highlighted in blue. Internally, this differentiates built-in functions vs. user defined functions. However, the mathematica community (pretty much everywhere) sticks to this naming convention fairly well, so distinguishing the two would serve some purpose.

Screenshots & code samples:

1. Simple examples

Here's a small example set, with a screenshot at the end showing how it looks in Mathematica:

(*simple pattern objects & operators*)
f[x_, y__] := x Times @@ y  

(*pattern objects with chars at the end and strings*)

f[x_String] := x <> "hello@world" 

(*pattern objects with ?xxx at the end*)

f[x_?MatrixQ] := x + Transpose@x

<< Combinatorica` (*example with backticks and inline comment*)

(*Slightly more complicated example with a mix of stuff*)

Developer`PartitionMap[Total, Range@1000, 3][[3 ;; -3]]~Partition~2 //
  Times @@@ # &

enter image description here

2. A real world example

Here's an example from this answer of mine that also indicates my point 2 in the "Additional nice to haves" section, i.e., lowercase stuff being highlighted in blue.

Also, you might notice some of the variables highlighted in orange – I purposefully didn't include that as a requirement, as I think that's going to be a lot harder to do without a parser that knows Mathematica.

prob = MapIndexed[#1/#2 &, 
    Accumulate[
     EuclideanDistance[{0, 0}, #] < 1 & /@ arrows // Boole]]~N~4;

Manipulate[
 Graphics[{White, Rectangle[{-5, -5}, {5, 5}], Red, Disk[{0, 0}, 1], 
   Black, Point[arrows[[;; i]]], 
   Text[Style[First@prob[[i]], Bold, 18, "Helvetica"], {-4.5, 4.5}]}, 
  ImageSize -> 200], {i, Range[2, 20000, 1]}, 
 ControlType -> Manipulator, SaveDefinitions -> True]

enter image description here

Is this feasible? Too much? Too hard? Impossible?

Quite frankly, I don't know the answer to any of those. I just listed some basic features that everyone on mathematica.SE would love to have and some additional stuff that would be a cherry on the top. However, do let me know if these are too difficult to implement. We can work out a smaller subset of features.

In recognition of this help, you all have the Mathematica community's eternal gratitude and in addition, I'll award a 500 bounty to each person that contributes significantly to this (if it's done in parts by different folks) – I'll rely on your votes/comments/output on the answers to decide what's significant (perhaps more than one bounty to one person if they do all the work). Implementing the "Additional nice to haves" gets an automatic +500 regardless of previous bounties, so you can also build upon the work of others even if you don't do the first half. I might also periodically place smaller bounties to attract users who might not have seen this question, so if you happen to earn those bounties, they'll be in addition to the "bounty to reward an existing answer" which will be decided towards the end.

Lastly, I'm not in a hurry. So please take your time with this question. The bounty is always an option until it is implemented by SE (or if it has been determined that existing answers satisfy the requirements completely). Ideally, I'm hoping to get this implemented 2/3rs of our way into the beta, which is 2 months from now.

Community
  • 1
  • 1
  • 4
    Regarding nice-to-have item #2: I'm sure that the people who develop the popular Mathematica packages do stick to the capitalization convention, but in my circles, which consist of people who simply use it for research, that convention is virtually unheard of - as in, literally, nobody knows about it. So I'm not sure it's safe to rely on it. That being said, perhaps "enforcing" the convention in the syntax highlighting might convert a few people. – David Z Jan 22 '12 at 05:44
  • 1
    @DavidZaslavsky My experience has been the othr way round – people who develop popular packages actually do use capital names (for the most part). This is of course, after they've done thorough testing to make sure there are no collisions and shadowing. From what I've seen on [so], most folks stick to lower case, and are quick to point out this convention to others. I guess it depends on whether they've seen the Mathematica book or not – it's one of the first things Wolfram suggests :) –  Jan 22 '12 at 05:51
  • 1
    Ah, that would make sense. My impression is that the Mathematica book is not as popular "in the wild" as one might like it to be. – David Z Jan 22 '12 at 06:06
  • Can you add another "nice to have"? Any way to differentiate indexing (`a[[i]]`) from functions (`f[x]`) would be excellent! This should be possible if nested comments can be handled. – Szabolcs Jan 22 '12 at 14:24
  • Another nice to have: auto indenting ala Mma? – JxB Jan 22 '12 at 15:11
  • Yoda, which version of Mma are you using? The default comment colour in 8 is dark gray (at least for me). FYI, I get a "General::compat: Combinatorica Graph and Permutations functionality has been superseded by preloaded functionally..." message with your first code example. – JxB Jan 22 '12 at 15:18
  • @jxb I'm using version 8. Those examples were not necessarily meant to be evaluated and mean something. It was merely to highlight the presence of backticks, which is very common in mma. The second example was the "real world" example –  Jan 22 '12 at 15:47
  • @JxB re: the colours, I just realized from szabolcs' comment in chat that it is I that had turned comments to grey and strings to pink. I did this globally long time ago and forgot about it. You're right, mma uses various shades of gray for the two – which isn't very helpful –  Jan 22 '12 at 15:53
  • I don't know if this falls within the scope of prettify.js, but I'd like to see \\[Omega] displayed as the symbol for the greek letter omega, etc. – JxB Jan 23 '12 at 07:07
  • @JxB You can write an Ω already, even if you don't have a Greek keyboard layout installed like me ;-) just use LaTeX: $\Omega$. Regarding code blocks, it is essential that they are *copyable*, so any change to the text itself (as opposed to its styling) is unacceptable. – Szabolcs Jan 23 '12 at 13:00

2 Answers2

43

Preface

Since the Mathematica support for google-code-prettify was mainly developed for the new Mathematica.Stackexchange site, please see also the discussion here.

Introduction

I have no deep knowledge of all of this, but there were times when I wrote a cweb plugin for Idea to have my code highlighted there. In an IDE all this is not a one step process. It is divided into several steps and each step has more highlighting-abilities. Let me explain this a bit to give later some reasons why some things are (imho) not possible for a code-highlighter we need here.

At first the code is split into tokens which are the single parts of a programming language. After this lexer you can categorize intervals of your code into e.g. whitespace, literal, string, comment, and so on. This lexer eats the source-code by testing regular expressions, storing the token-type for a text-span and stepping forward in the code.

After this lexical scan the source-code can be parsed by using the rules of the programming language, the tokens and the underlying code. For instance, if we have a token Plus which is of type Keyword then we know that the brackets and the parameter should follow. If not, the syntax is not correct. What you can build with this parsing is called an AST, abstract syntax tree, and looks basically like the TreeForm of Mathematica syntax.

With a nicely designed language, like Java for instance, it is possible to the check the code while typing and make it almost impossible to write syntactically wrong code.

prettify.js and Mathematica Code

First, the prettify.js implements only a lexical scanner, but no parser. I'm pretty sure, that this would be impossible anyway regarding the time-constrains for displaying a web-page. So let me explain what features are not possible/feasible with prettify.js:

Also, you might notice some of the variables highlighted in orange – I purposefully didn't include that as a requirement, as I think that's going to be a lot harder to do without a parser that knows Mathematica.

Right, because the highlighting of these variables depends on the context. You have to know, that you are inside a Table construct or something like that.

Hacking prettify.js

I think hacking an extension for prettify.js is not so hard. I'm an absolute regular expression noob, so be prepared of what follows.

We don't need so much stuff for a simple Mathematica lexer. We have whitespace, comments, string-literals, braces, a lot of operators, usual literals like variables and a giant list of keywords.

Lets start, with the keywords in java-script regexp-form:

Export["google-code-prettify/keywordsmma.txt", 
   StringJoin @@ Riffle[Apply[StringJoin, 
         Partition[Riffle[Names[RegularExpression["[A-Z].*"]], 
             "|"], 100], {1}], "'+ \n '"], "TEXT"]

The regular expression for whitespace and string-literals can be copied from another language. Comments are matched by something like

/^\(\*[\s\S]*?\*\)/

This runs wrong if we have comments inside comments, but for the moment I don't care. We have braces and brackets

/^(?:\[|\]|{|}|\(|\))/

We have something like blub_boing which should be matched separately.

/^[a-zA-Z$]+[a-zA-Z0-9$]*_+([a-zA-Z$]+[a-zA-Z0-9$]*)*/

We have the slots #, ##, #1, ##9 (currently only one digit can follow)

/^#+[0-9]?/

We have variable names and other literals. They need to start with either a letter or $ and then can follow letters, numbers and $. Currently \[Gamma] is not matched as one literal but for the moment it's ok.

/^[a-zA-Z$]+[a-zA-Z0-9$]*/

And we have operators (I'm not sure this list is complete).

/^(?:\+|\-|\*|\/|,|;|\.|:|@|~|=|\>|\<|&|\||_|`|\^)/

Update

I cleaned the stuff a bit up, did some debugging and created a color-style which looks beautiful to me. The following stuff works as far as I can see correctly:

  • All system symbols which can be found through Names[RegularExpression["[A-Z].*"]] are matched and highlighted in blue
  • Braces and brackets are black but bold font-weight. This was an suggestion from Szabolcs and I like it very much since it definitely add some energy to the appearance of the code
  • Patterns, as they appear in function definitions and the slots of pure functions are highlighted in green. This was suggested by Yoda and goes along with the highlighter in the Mathematica frontend. Patterns are only green in combination with a variable like in blub__Integer, a1_ or in b34_Integer32. Testfunctions for the pattern like in num_?NumericQ are only green infront of the question mark.
  • Comments and Strings have the same color. Comments and strings can go over several lines. Strings can include backslashed quotes. Comments cannot be nested.
  • For the coloring I used consistently the ColorData[1] scheme to ensure colors look nice side by side.

Currently it looks like that:

enter image description here

Testing and debugging

Szabolcs asked whether and how it is possible to test this. This is easy: You need my google-code-prettify source (Where can I put this, so that everyone has access?). Unpack the sources and open the file tests/mathematica_test.html in a webbrowser. This file loads by itself the files src/prettify.js, src/lang-mma.js and src/prettify-mma-1.css.

  • in lang-mma.js you find the regular expression the lexer is using when splitting the code into tokens.
  • in prettify-mma-1.css you find the style definitions I use

To test your own code, simply open mathematica_test.html in an editor and paste your stuff between the pre tags. Reload the page and your code should appear.

Debugging: If the highlighter is not working correctly, you can debug with an IDE or with Google-Chrome. In Chrome you mark the word where the highlighter starts to fail and make right-klick and Inspect Element. What you see then is the underlying html-highlight code. There you can see every single token and you see which type the token is. This looks then like

<span class="tag">[</span>

You see the open bracket is of type tag. This matches with the regexp definition I made in lang-mma.js. In Chrome it is even possible to browse the JS code, set breakpoints and debug it while reloading your page.


Local installation for Google Chrome and Firefox

Tim Stone was so kind to write a script which injects the highlighter during the loading of sites under http://stackoverflow.com/questions/. As soon as google-code-prettify is turned on for mathematica.stackexchange.com it should work there too. I adapted this script to use my lexical scanning rules and colors. I heard that in Firefox the script is not always working, but this is how to install it:

Versions

Under https://github.com/halirutan/Mathematica-Source-Highlighting/raw/master/mathematica-source-highlighter.user.js you will always find the most recent version. Here is some change history.   - 02/23/2013 Updated the lists of symbols and keywords to Mathematica version 9.0.1 - 09/02/2012 some minor issues with the coloring of Mathematica-patterns were fixed. For a detailed overview of features with Pattern-operator : see also the discussion here

  • 02/02/2012 support of many number input formats like .123`10.2 or 1.2`100.3*^-12, highlighting of In[23] and Out[4], ::usage or other messages like blub::boing, highlighting of patterns like ProblemTest[prob:(findp_[pfun_, pvars_, {popts___}, ___]), opts___], bug-fixes (I checked the parser against 3500 lines of package code from the AddOns directory. It took about 3-4 sec to run, which should be more than fast enough for our purposes.)
  • 01/30/2012 Fixed missing '?' in the operator list. Included named-characters like \\[Gamma] to give a complete match for such symbols. Added $variables in the keyword list. Improved the matching of patterns. Added matching of context constructions like Developer`PackedArrayQ. Switch of the color-scheme due to many requests. Now it's like in the Mathematica-frontend. Keywords black, variables blue.
  • 01/29/2012 Tim hacked to injecting code. Now the highlighting works on mathematica.stackexchange too.
  • 01/25/2012 Added the recognition of Mathematica-numbers. This should now highlight things like {1, 1.0, 1., .12, 16^^1.34f, ...}. Additionally it should recognize the backtick behind a number. I switched comments and strings to gray and use a dark red for the numbers.
  • 01/23/2012 Initial version. Capabilities are described under section Update.
Community
  • 1
  • 1
halirutan
  • 4,281
  • 18
  • 44
  • Could you explain to us JavaScript-laymen how to test this? – Szabolcs Jan 22 '12 at 08:40
  • In your update, why are some of the system functions in blue and the others in black? Did you intend for them to be that way? – abcd Jan 23 '12 at 04:39
  • Those functions are user-functions. They just start with a capital letter. As stated in the post, I match everything from `Names[RegularExpression["[A-Z].*"]]`. With this everyone sees the difference between user and kernel functions. – halirutan Jan 23 '12 at 04:54
  • @halirutan ohhh... never mind. My bad. I'm so used to seeing blue for user defined and black for system that I didn't realize that you had flipped them =) – abcd Jan 23 '12 at 05:30
  • Oh ... I don't care much about the highlighting style, but I think blue for system functions is a bad idea. "Blue" says "undefined" for Mathematica users. I know you changed this because there were complaints about the **bold**. – Szabolcs Jan 23 '12 at 09:23
  • Is there any chance we could make this into a userscript (to be installed into Chrome / Greasemonkey), and host it somewhere so people can install (and test!) with a single click? Tim put his [here](https://github.com/tms/stack-utilities/blob/master/stack-utilities-mathematica-prettify.user.js). You can just click the [raw](https://github.com/tms/stack-utilities/raw/master/stack-utilities-mathematica-prettify.user.js) link to install (if your browser supports it, i.e. Chrome/Greasemonkey) – Szabolcs Jan 23 '12 at 09:23
  • I chose blue for keywords for three reasons. First, it is consistent with most other languages like Java, C, Lisp, Matlab, etc, so it gives a familiar feeling. Second reason is, when you look at the Wolfram Workbench, people from Wolfram highlight known symbols in blue too. And third, it would maybe feel too much colored when every literal is in blue. – halirutan Jan 23 '12 at 11:41
  • 8
    @Szabolcs, I'm the prettify maintainer. Both Halirutan and I would like the mathematica mode to end up in the master repo for prettify, so any prettify user-script that uses the latest version of prettify would get it. – Mike Samuel Feb 02 '12 at 19:49
2

Not exactly what you are asking for, but I created a similar extension for MATLAB (based on the excellent work already done here). The project is hosted on github.

The script should solve some of the issues common for MATLAB code on Stack Overflow:

  • comments (no need to use tricks like %# ..)
  • transpose operator (single quote) is correctly recognized as such (confused with quoted strings by the default prettifier)
  • highlighting of popular built-in functions

Keep in mind the syntax highlighting is not perfect; among other things, it fails on nested block comments (I can live with that for now). As always, comments/fixes/issues are welcome.

A separate userscript is included, it allows switching the language used as seen in the screenshot below:

--- before ---

before

--- after ---

after

For those interested, a third userscript is provided, adapted to work on "MATLAB Answers" website.


TL;DR

Install the userscript for SO directly from:

https://github.com/amroamroamro/prettify-matlab/raw/master/js/prettify-matlab.user.js

Amro
  • 123,847
  • 25
  • 243
  • 454
  • this is great! The poor highlighting support used to annoyed me to no extent... thanks a lot for this! If you look under halirutan's answer, you'll see a comment by Mike Samuels. He's the prettify maintainer and you could talk to him directly to see if he'd include this in his repo. Then, SO can directly use your highlighter without a script (btw, they installed halirutan's custom highlighter on mathematica.se). – abcd May 04 '12 at 14:43
  • Having used it now for a while, I really enjoy browsing the MATLAB tag now with this userscript. I wish the IDE also highlighted built-in functions like yours. That will immediately reduce all the cases where they use something like `max/min/std/var` etc. as a variable _and_ a function – abcd May 04 '12 at 22:14
  • @Bringbackspy: I'm glad you find it useful. I would love for this to be part of the official prettify, for now I hope to get it tested and get more feedback from the good people of SO.. As you have some experience from being a moderator on a few other SE sites, do you have any advice/suggestions for promoting this to the MATLAB community on SO? (Unfortunately we are not as active as you guys in MM!) – Amro May 05 '12 at 15:30
  • I tried getting the MATLAB community involved more [by creating a chatroom](http://chat.stackoverflow.com/rooms/info/2793/matlab) a long time ago, but no one joined (I didn't know how to promote then either). I would suggest just commenting to the regulars first and directing them here and getting them to use your script. There might be questions, suggestions and bugs, so make use of a chatroom for this. I'll get a moderator to revive that MATLAB room and we can get it back running. I've seen that being more active in chat helps build a better and tighter community – abcd May 05 '12 at 15:59
  • Also, halirutan's highlighter is now integrated fully into [mathematica.se] engine and no longer needs to be used as a script. In the comments on the [meta post](http://meta.mathematica.stackexchange.com/a/369), balpha (SE dev) mentioned a few reasons why the script won't be implemented on SO — the most important reason being the size of the file. Since MATLAB has fewer keywords/functions than Mathematica (I'm assuming you didn't include _all_ the toolboxes), it should be fairly small. If the size of your file is comparable to, say, SQL/PHP in the `prettify.js` bundle, then there's a chance – abcd May 05 '12 at 16:02
  • thanks, chatrooms do sound like a good place for such discussions, even though I'm not much of the chatty type :) I have read the meta post you mentioned, and you are right in that the code only tries to recognize the subset of built-in functions (no toolboxes at all), and even then I only picked the most common ones (around 600). If it proves to be a deal breaker in terms of file size, we can always just drop detecting builtin functions (after all, MATLAB IDE has no such thing as you previously noted) – Amro May 05 '12 at 16:50