1

I am writing a java program that parses some shell code and I want to remove the content inside echo statements. For the beginning, I want to take the whole echo command. My actual pattern looks like this:

Pattern pat = Pattern.compile("echo[\\t ]+\".*?\"");

This will match echo + at least one space or tab + double quotes + the smallest number of characters (I used the reluctant quantifier) + double quotes.

The problem is when I have an echo like this:

echo "This will not \" work";

My pattern will match only until backslash. What could I do fix this?

Sorin
  • 908
  • 2
  • 8
  • 19
  • 2
    Have a look [here](http://stackoverflow.com/questions/17043454/using-regexes-how-to-efficiently-match-strings-between-double-quotes-with-embed); but be aware that this is far from being the only way to quote arguments in a shell line. – fge Jul 11 '13 at 11:43
  • How else could you do it? – Sorin Jul 12 '13 at 08:17
  • FOr instance: `echo This\ wi'l'"l no"t' "'\ wor'k'` <-- this will produce the exact same result as the quoted command in your example – fge Jul 12 '13 at 08:30

2 Answers2

2

You can use negative look-behind to ensure that the last character isn't a \:

"echo[\\t ]+\".*?(?<!\\\\)\""

\\\\ represents a single \ character. It needs to be escaped to \\ for the regex and then each \ needs to be escaped again for the compiler.

Test.

More about look-around.

A problem with the above is that echo "\\" will not match (presumably \ is an escape character and \\ means the \ character). A more correct method might be to check for \'s, and consume the character following each \:

"echo[\\t ]+\"(\\\\.|[^\\\\])*?\""

Test.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
  • That is a rather complicated solution compared to loop unrolling :p – fge Jul 11 '13 at 12:08
  • @fge `\"(\\\\.|.)*?\"` VS `\"[^\\\\\"]*(\\\\.[^\\\\\"]*)*\"`? I'm more inclined to say that the prior is simpler. – Bernhard Barker Jul 11 '13 at 12:26
  • Not me. Being short is one thing, being understandable is another ;) What is more, your regex will match `"""` – fge Jul 11 '13 at 13:07
  • @fge If given `echo """` it will find `echo ""`, which, from the question, as stated, appears to be correct. I interpreted the question as being a `find`, not a `matches`. – Bernhard Barker Jul 11 '13 at 13:15
  • Except this is not valid in the shell, and I very much doubt OP wants that. – fge Jul 11 '13 at 13:47
0

You can create an alternation that will match \" specifically

echo[\\t ]+\"([^\\\\]|\\\\\")*?\"
Explosion Pills
  • 188,624
  • 52
  • 326
  • 405