3

I'm trying to pull out comments in a MATLAB file. In MATLAB, comments are denoted with % so the sensible thing would be to search for %.*. However, MATLAB also has functions like sprintf and fprintf which allow something like sprintf('x = %d', 5) and that regex would find %d', 5) as well, which I don't want. Of course I'd also want to ignore variations such as %s or %f. Is there a way to capture only those segments that match %.* but which are not enclosed in ' characters? I suppose I should clarify that I'm generally trying to capture comments starting with %, but ignoring any % within string literals. The sprintf was simply an example of such an occurence that I want to ignore.

I found this question, which seems related, but no solutions posted there solve my problem.

Community
  • 1
  • 1
zephyr
  • 2,182
  • 3
  • 29
  • 51
  • If you assume that all 'real' comments have a leading space, `%\s.*` might work. – sco1 Oct 01 '15 at 17:42
  • That is generally the case, but might not be always. I'd prefer something that captured the cases where someone commented without a space. – zephyr Oct 01 '15 at 17:46
  • If you're confident with vim then this answer might help [Match by syntax highlighting instead of expressions](http://vi.stackexchange.com/a/3573/4655) – Steve Oct 01 '15 at 17:55
  • @Steve Unfortunately, given my constraints, I cannot use plugins. Nice find though! – zephyr Oct 01 '15 at 17:57
  • Another idea: if you use the MATLAB editor to publish it produces an HTML version of the script and puts all comments in a `` element. Maybe these would be easier to remove. Pretty convoluted and not particularly automated, but if it's just for a few big files it might work. – Steve Oct 01 '15 at 18:01
  • In addition, there's the matlab function [`publish`](http://uk.mathworks.com/help/matlab/ref/publish.html). – Steve Oct 01 '15 at 18:06

2 Answers2

2

My final regex :

  • ^(^[^']+|[^']+('.*')+[^']+)?(;|,)\s*%(?<com>.*)|^(\s)*%(?<com2>.*)
regexp('%i am a comment', '^(^[^'']+|[^'']+(''.*'')+[^'']+)?(;|,)\s*%(?<com>.*)|^(\s)*%(?<com2>.*)', 'names')

response:

com2: 'i am a comment'
com: []

 regexp('printf () ; %i am a comment after a command','^(^[^'']+|[^'']+(''.*'')+[^'']+)?(;|,)\s*%(?<com>.*)|^(\s)*%(?<com2>.*)', 'names')

response:

 com2: []
 com: 'i am a comment after a command'

  regexp('printf ('' % i m not a comment '') , %i am a comment after a command followed by comma', '^(^[^'']+|[^'']+(''.*'')+[^'']+)?(;|,)\s*%(?<com>.*)|^(\s)*%(?<com2>.*)', 'names')

Response:

com2: []
 com: 'i am a comment after a command followed by comma'

This case to make sure the comment isnt caught:

regexp('printf('' ;%i m not a comment '');', '^(^[^'']+|[^'']+(''.*'')+[^'']+)?(;|,)\s*%(?<com>.*)|^(\s)*%(?<com2>.*)', 'names')

ans =

0x0 struct array with fields:
com2
com

the comments are stored in variables com and com2

Abr001am
  • 571
  • 6
  • 19
  • That is similar to something I had tried, but doesn't do fully what I want. First, it picks up any trailing `;` in the line, and second, it doesn't catch any comment after a line not ending in `;`. – zephyr Oct 01 '15 at 17:51
  • @zephyr do you mean a line ending with `,` ? its simple, just add it aside ! – Abr001am Oct 01 '15 at 17:52
  • No, ending in `;`. You have `;(\s)*%.*)` in that which, from the line `sprintf('%d',5); %comment` would find `; %comment`. What's more, it wouldn't find the comment in `if (true) %this is true`. – zephyr Oct 01 '15 at 17:55
  • @zephyr get rid of first character using `string(2:end)` lol whats more ? – Abr001am Oct 01 '15 at 17:57
  • @zephyr the tokens are stored in `com` and `com2` – Abr001am Oct 01 '15 at 18:26
  • Sorry to be so particular, but I cannot use the group structure `(?`. – zephyr Oct 01 '15 at 18:40
  • @zephyr use 'names' instead of 'match' – Abr001am Oct 01 '15 at 18:47
  • @zephyr happy satisfied now !! :D – Abr001am Oct 01 '15 at 19:03
  • As I said, I cannot use the `(?` group structure. With the regex available to me, that is not a valid option. Thanks for all the effort you put in though. Hopefully it will help someone else who may have a similar issue. – zephyr Oct 01 '15 at 19:08
  • That is not an option for me. What's more, I'm only using regex on matlab files, not regex from matlab. – zephyr Oct 01 '15 at 19:10
1

This doesn't meet the question's requirements, but I thought I'd share it anyway.

If MATLAB is accessible, then you can use the publish function, then pull out the comments with grep.

So for the following function in myfun.m

function [out] = myfun(n) 
% Comment
out = ['% Not a ',... this is a comment too
    'comment'];
fprintf('%d',n)%do this
%{
 Multiline
 comment
%}

we run

publish('myfun.m')

which produces the file html/myfun.html. Now with e.g. bash, we can run

egrep -o -e "<span class=\"comment\">.*?</span>" html/myfun.html

which returns

<span class="comment">% Comment</span>
<span class="comment"> this is a comment too</span>
<span class="comment">%do this</span>
<span class="comment">%}</span>

This is not quite there, since publish has split lines like this

<span class="comment">%{
</span><span class="comment"> Multiline
</span><span class="comment"> comment, n&gt;2
</span><span class="comment">%}</span>

This needs How can I search for a multiline pattern in a file?

Community
  • 1
  • 1
Steve
  • 1,579
  • 10
  • 23