Shell remove string including newlines

Question

I am currently working on a custom source patcher and I'm having troubles with replacing string by another, including newlines.

For instance, I want to remove this pattern :

\n/* @patch[...]*/

In order to get this... :

this.is = code ;
/* @patch beta
    blah blah
*/
if (!this.is) return 0 ;
/* @patch end */

... to this :

this.is = code ;
if (!this.is) return 0 ;

And not this :

this.is = code ;
<- newline
if (!this.is) return 0 ;
<- newline

Using a shell script, I'm using sed command in order to do what I want :

sed -e "s|\/\* @patch.*\*\/||g" $file > $file"_2"

This works pretty well, but the newlines are still there.

This way doesn't work as sed can't parse newlines :

sed -e "s|\n\/\* @patch.*\*\/||g" $file > $file"_2"

Neither this method work : How can I replace a newline (\n) using sed? , nor tr (second answer on the same thread).

Would you have any solution to this ? Even heavy ones, performance is not important here.

P.S. : I am working on a web application, and in this case JavaScript files. Under Mac OS X Yosemite, but no matter what system I'm using, it seems to be a common issue for all bash users.

I found out another solution using Node.js for those who have troubles with their Awk version :

node -e "console.log(process.argv[1].replace(/[\n\r]\/\* @patch([\s\S]*?)\*\//mg, ''))" "`cat $filepath`"

`d` deletes current line. check with a nice tuto will help http://www.grymoire.com/Unix/sed.html — Jason Hu, Jun 23 '15 at 15:38
I just found a duplicate answer that seems quite good. Also, deleting my answer since I was just covering one-line comments (for posterity, it was based on @EtanReisner 's suggestion: `sed '/^\/\* @patch.*\*\/$/d' file`) — fedorqui, Jun 23 '15 at 16:02
P.S. : The 'already answering' related topic does not answer my question at all. Please unmark my question as duplicated unless you find a real solution elsewhere to my issue. — Tot, Jun 23 '15 at 16:08
@Tot it seems that you have some different tools here. Update your question explaining what is your exact system, what sed you are using, etc. Otherwise, people will keep updating their answer after any of your updated. Reopening the question, I had marked as duplicate to [Remove multi-line comments](http://stackoverflow.com/q/13061785/1983854) — fedorqui, Jun 23 '15 at 16:14
@fedorqui Alright, I understand your point. But this topic was neither answering the main problem as it was too specific to C. That is why I didn't understand this duplication mark. — Tot, Jun 23 '15 at 16:31
@fedorqui Indeed, it performs correctly with my files. :) However it removes all comments (I don't know how to put the `@patch` rule in that regex), plus it doesn't remove the newlines where the comments have been erased. — Tot, Jun 24 '15 at 08:43

Ed Morton · Accepted Answer · 2015-06-24T15:27:35.130

2

sed is for simple substitutions on individual lines, for anything else you should be using awk:

$ awk -v RS='^$' -v ORS= '{gsub(/[*][/]/,"\0"); gsub(/\n[/][*] @patch[^\0]+\0/,""); gsub(/\0/,"*/")} 1' file
this.is = code ;
if (!this.is) return 0 ;

The above uses GNU awk for multi-char RS to read the whole file as a single string (with other ask you just build up the string line by line and process in the END section) and relies on your file not containing any NUL (\0) characters.

The first gsub() changes every */ to one char (a NUL) so the 2nd gsub() can negate it in a bracket expression as part of your desired regexp and then the third gsub() restores any remaining NULs to */s.

With non-gawk you need to build up the string:

awk '{rec = rec $0 RS} END{gsub(/[*][/]/,"\0",rec); gsub(/\n[/][*] @patch[^\0]+\0/,"",rec); gsub(/\0/,"*/",rec); printf "%s",rec}' file

and it sounds like your awk requires the /s in the bracket expressions escaped so it doesn't see them as the terminating char of the RE:

awk '{rec = rec $0 RS} END{gsub(/[*][\/]/,"\0",rec); gsub(/\n[\/][*] @patch[^\0]+\0/,"",rec); gsub(/\0/,"*/",rec); printf "%s",rec}' file

If your awk doesn't like NUL chars then use some control character, e.g. (where every ^C is a literal control-C character):

awk '{rec = rec $0 RS} END{gsub(/[*][\/]/,"^C",rec); gsub(/\n[\/][*] @patch[^^C]+^C/,"",rec); gsub("^C","*/",rec); printf "%s",rec}' file

or use the pre-defined SUBSEP control char that awk uses to separate array indices (note you now need to double-up the backslashes in the REs that are concatenation of literal strings with SUBSEPs since they are now dynamic regexps instead of constant regexps, see http://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps for details):

awk '{rec = rec $0 RS} END{gsub(/[*][\/]/,SUBSEP,rec); gsub("\\n[\\/][*] @patch[^"SUBSEP"]+"SUBSEP,"",rec); gsub(SUBSEP,"*/",rec); printf "%s",rec}' file

edited Jun 24 '15 at 15:27

answered Jun 23 '15 at 17:35

Ed Morton

188,023
17
78
185

I have a sort of syntax error : `[...] awk: nonterminated character class [*][ source line number 1`. I suspected the space between the `ORS=` and the following. When I remove this space, I have this error : `[...] awk: bailing out at source line 1`. :( – Tot Jun 24 '15 at 08:48
The space you removed is necessary as I'm setting ORS to be empty. Whatever awk you are using might need the `/` in `[/]` escaped so it becomes `[\/]` but it sounds like you're not using gawk. Try `awk --version` to find out. and I'll add a non-gawk version. – Ed Morton Jun 24 '15 at 12:10
Ah indeed, I had to escape them. My awk version is `awk version 20070501`. So now it runs, however it removes all `*/`, even those not beginning with `/* @patch`. :o – Tot Jun 24 '15 at 12:33
1

Are you SURE you copy/pasted the last script in my answer exactly? If that script is removing `*/`s that don't start with `/* @patch` then best I can tell that means your awk is significantly broken as it's removing text that you haven't told it to remove but we'd need to see exactly what script you are running along with the input you are running it against and the output you are getting to be able to help you debug it so edit your question to show all of that. – Ed Morton Jun 24 '15 at 12:53
Done. I've updated and added my results. And yes, I'm extra-sure that I've used all your `awk` examples. Though as you said, my `awk` might be broken, but it would be better if someone else tests the same for us. – Tot Jun 24 '15 at 15:15
I get the same result you do with `nawk`. All other awks I tried behave as I'd expect. Despite it's full name of `new awk`, `nawk` is a actually an old, pre-POSIX awk so it will behave "odd" at times and is best avoided if possible. It's not nearly as bad as old, broken awk though (aka `oawk`) so no need to try TOO hard to avoid it. It looks like `nawk` does not like NUL chars so replace every `\0` with some control character, e.g. `control-C` or the value of `SUBSEP`. I've updated my answer to show those options. – Ed Morton Jun 24 '15 at 15:28
1

Alright, got it. :) I will keep that in mind. Thanks for your help. – Tot Jun 26 '15 at 15:48

Shell remove string including newlines

1 Answers1