6

I'm working on a parser for Matlab, using a whole bunch of code from the Matlab Central File Exchange as test data. While sifting through some of it, I found that some of the code I downloaded legitimately shouldn't parse (i.e. Matlab itself won't accept it).

Is there an easy way to check if an m-file (either function or script) contains syntax errors -- perhaps some library function? I'm not looking to run the code, just see if it should parse.

Ismail Badawi
  • 36,054
  • 7
  • 85
  • 97
  • 2
    lookup CHECKCODE/MLINT or the undocumented MTREE functions – Amro Dec 18 '13 at 22:07
  • It sounds interesting.. Out of curiosity, is your project publicly accessible? – Amro Dec 18 '13 at 22:53
  • 1
    @Amro it's part of the McLab project at McGill university (http://www.sable.mcgill.ca/mclab/, https://github.com/Sable/mclab). I thought it simpler to say "I'm working on a parser" but that's not quite true :s – Ismail Badawi Dec 18 '13 at 22:55
  • very nice! Here is a look at how many submissions are using these parsing functions: http://mcbench.cs.mcgill.ca/list?query=%2F%2FParameterizedExpr[is_call%28%27mtree%27%29+or+is_call%28%27mlint%27%29+or+is_call%28%27checkcode%27%29]&query_id=12 – Amro Dec 18 '13 at 23:38

2 Answers2

7

If you are willing to use undocumented functions, consider the following:

function isValid = checkValidMFile(file_name)
    % make sure file can be found
    fname = which(file_name);
    assert(~isempty(fname) && exist(fname,'file')~=0, 'file not found');

    % parse M-file and validate
    t = mtree(fname, '-file');
    if count(t) == 0 || (count(t)==1 && iskind(t,'ERR'))
        isValid = false;
    else
        isValid = true;
    end
end

(You could also pass it a string of MATLAB language code instead of a saved file name).

Of course mtree will give a lot more information, it can actually return the parse tree, as well as any warnings or errors. I have previously used it to differentiate between scripts vs. functions, and to strip all comments from an M-file.

It is unfortunately not officially supported, so you will have to browse its source code to figure out what everything means (thankfully it is well commented). The function uses the internal mtreemex MEX-function.


Additional (undocumented) ways:

builtin('_mcheck', 'some_file.m')

and

checkSyntacticWarnings('./path/to/folder/')
Community
  • 1
  • 1
Amro
  • 123,847
  • 25
  • 243
  • 454
  • 1
    This seems close to what I want, but it doesn't seem to catch all syntax errors. For instance, Matlab gives an error for `[1,,2]`, but `mtree('[1,,2]')` doesn't. I guess Matlab catches that error after the parsing stage for some reason? – Ismail Badawi Dec 18 '13 at 22:43
  • huh you are right! Actually it seems to correct it :) `tree2str(mtree('[1,,2]'))` returns `[1,2]`. Try my other suggested solutions.. – Amro Dec 18 '13 at 22:49
4

Since 2011b the way to parse Matlab code is via checkcode. In older versions of Matlab you can use mlint (in R2013a+, and maybe earlier, this command just calls checkcode). Both of these rely on a private undocumented function called mlintmex. You can learn a bit more about this function and related topics on the Undocumented Matlab website.

Another potentially-related project is Linguist, which is used by GitHub and others to classify languages and uses pygments.rb to highlight code. It supports Matlab. A while back the Matlab support used to be hit-or-miss, but I think that it has improved. These won't validate code, but they may be useful for what you're doing.

horchler
  • 18,384
  • 4
  • 37
  • 73
  • Is there a way to distinguish syntax errors from other problems? I have the same problem here as with the other answer; e.g. `[1,,2]` gets a warning like `a comma cannot immediately follow another comma`, while others get something with `parse error` which is easy to look for. – Ismail Badawi Dec 18 '13 at 22:46