7

I want to extend highlight.js capabilities for R language so that (1) all function names that are followed by opening parenthesis ( and (2) all package names that are followed by :: and ::: operators would be highlighted (as it is in RStudio, see Fig.1.). Parentheses (, ) and the operators ::, ::: should not be highlighted.

Fig.1. Desired highlighting. Fig.1. Desired highlighting of R code parts (function and package names).

My example consists of two files: index.html and r.min.js.

HTML file:

<html lang="en-us">
<head> <meta charset="utf-8">
    <link href='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/agate.min.css' rel='stylesheet' type='text/css' />
</head>

<body>

<pre class="r"><code>doc_name &lt;-
    officer::read_docx() %&gt;% 
    flextable:::body_add_flextable(table_to_save) %&gt;% 
    print(target = &quot;word.docx&quot;)

.libPaths()

c("a", "b")

package::function()$field
</code></pre> 

<script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@9.12.0/build/highlight.min.js"></script>
<script src="r.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>

</body>
</html>

r.min.js file:

hljs.registerLanguage("r",function(e){var r="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";return{c:[e.HCM,{b:r,l:r,k:{keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass ...",literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},r:0},{cN:"number",b:"0[xX][0-9a-fA-F]+[Li]?\\b",r:0},{cN:"number",b:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",r:0},{cN:"number",b:"\\d+\\.(?!\\d)(?:i\\b)?",r:0},{cN:"number",b:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{b:"`",e:"`",r:0},{cN:"string",c:[e.BE],v:[{b:'"',e:'"'},{b:"'",e:"'"}]},

/* My attempt... */
/* ... to highlight function names between double 
and triple colons and opening parenthesis (in red as symbol): */
{cN:"symbol",b:":::|::",e:"\\(",eB:!0,eE:!0},

/* ... to highlight other function names (in red as symbol): */
{cN:"symbol",  b:"([a-zA-Z]|\.[a-zA-Z.])[a-zA-Z0-9._]*",e:"\\(",eE:!0},

/* ... to highlight package names (in cyan as variable): */
{cN:"variable",b:"(?<!\w)",e:":::|::",eE:!0},

]}});

r.min.js is based on (this file) and contains highlight.js rules to identify r code elements. The lines I added are below the comment "My attempt." Meanings of the abbreviations: cN - css class name, b - "beggins", e - "ends", eB - "exclude begin", eE - "exclude end", other meanings are explained here.

The result I get (Fig.2.) is not satisfactory. It seems that regular expressions I use do not find the correct beginnings and ends of desired parts of the R code.

Fig.2. The result using modified <code>r.min.js</code>
Fig.2. The result using modified r.min.js

What should be the correct highlight.js code in r.min.js to get the parts of R code highlighted as in RStudio?

GegznaV
  • 4,938
  • 4
  • 23
  • 43
  • This doesn't answer your question, but an alternative approach is to use RMarkdown and get the highlighting done by R, not by `highlight.js`. The highlighter there uses the R parser, not regular expression matching, so it is guaranteed to match the R language. – user2554330 Jun 30 '18 at 10:57

1 Answers1

3

Sounds like a worthwhile improvement, so I tinkered for a while with it.

This should be fairly easy,

A regex to capture the package name prefixes could be written like this (demo):

\w+(?=:::?)

and for function names like this (demo):

\.?\w+(?=\()

unfortunately, it is not so easily applied to highlight.js language parsing rules.

After some back and trail and error, I settled with the following code that gives a pretty consistent highlighting:

/* ... to highlight other function names (in orange as a keyword): */
{
    cN: "keyword",
    b: /(^|\s*)(:::?|\.)\w+(?=\(|$)/
},
/* ... to highlight package names (in red as meta): */
{
    cN: "meta",
    b: /(^|\s*)\w+(?=:::?|$)/,
    r: 0
},
  • I use the cN|className keyword for functions this is what it is and it interferes less with the predefined style for functions.
  • The same goes for packages names where I suggest to use the cN meta. This is what other packages use for similar constructs, and again, it gives a more consistent result for built-in styles, e.g. numbers.
  • I've also added print and c to the list of keywords. The list for the R language is obviously somewhat incomplete. Arguably every function name (even from 3rd party packages) should be added as a keyword - this is how some other languages do it - but that's not very practical).

This is what I get.

Sample Code:

hljs.registerLanguage("r",function(e){var r="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";return{c:[e.HCM,{b:r,l:r,k:{keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass c print ...",literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},r:0},{cN:"number",b:"0[xX][0-9a-fA-F]+[Li]?\\b",r:0},{cN:"number",b:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",r:0},{cN:"number",b:"\\d+\\.(?!\\d)(?:i\\b)?",r:0},{cN:"number",b:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{b:"`",e:"`",r:0},{cN:"string",c:[e.BE],v:[{b:'"',e:'"'},{b:"'",e:"'"}]},
{cN: "keyword", b: /(^|\s*)(:::?|\.)\w+(?=\(|$)/},
{cN: "meta",b: /(^|\s*)\w+(?=:::?|$)/,r: 0 }, ]}});

hljs.initHighlightingOnLoad();
<html lang="en-us">
<head> <meta charset="utf-8"><link href='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/agate.min.css' rel='stylesheet' type='text/css' />
</head><body>

    <pre class="r"><code>library(officer)
doc_name &lt;-
    officer::read_docx() %&gt;% 
    flextable:::body_add_flextable(table_to_save) %&gt;% 
    print(target = &quot;word.docx&quot;)

.libPaths()
x = 4
c("a", "b")

package::function()$field
</code></pre>
<script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@9.12.0/build/highlight.min.js"></script>

</body></html>

Pretty close, but far from being perfect. The main hurdle here is that I struggle to fully understand how the parser interprets the patterns. Some of the results simply make no sense to me but still work.

wp78de
  • 18,207
  • 7
  • 43
  • 71
  • 1
    I believe the suggested vanilla regex solution does not work because of [issues with lookaheads in highlight.js](https://github.com/isagalaev/highlight.js/issues/1550). There is even an open [pull request](https://github.com/isagalaev/highlight.js/pull/1349) that address this issue that never made it into the source. – wp78de Jul 01 '18 at 19:32
  • 1
    PS: I would look into other js highlighters if this becomes a major obstacle. For instance, [rainbow.js](https://github.com/ccampbell/rainbow/blob/master/src/language/r.js) does a good job highlighting R and looks easier to adjust; setting colors is mostly a matter of identifying the detected style classes and set colors in the theme.css; [screenshot](https://ibb.co/iuh8jy). – wp78de Jul 01 '18 at 20:41
  • 1
    It's a pity that `highlight.js` has problems with lookaheads. Nevertheless, @wp78de, your answer, and the comments were really helpful. – GegznaV Jul 07 '18 at 23:33