11

How do I remove all comments if they start with /* and end with */ I have tried the following. It works for one line comment.

sed '/\/\*/d' 

But it does not remove multiline comments. for e.g. the second and third lines are not removed.

/*!50500 PARTITION BY RANGE (TO_SECONDS(date_time ))
 PARTITION 20120102parti VALUES LESS THAN (63492681600),
(PARTITION 20120101parti VALUES LESS THAN (63492595200) */ ;

In the above example, I need to retain the last ; after the closing comment sign.

shantanuo
  • 31,689
  • 78
  • 245
  • 403
  • # awk '/*/,/*\//' # would return all the comments. I need the text except comments – shantanuo Oct 25 '12 at 05:21
  • possible duplicate of [How can I delete all /\* \*/ comments from a C source file?](http://stackoverflow.com/questions/1714530/how-can-i-delete-all-comments-from-a-c-source-file) – Vijay Oct 25 '12 at 05:53
  • If you are referring to c source file then .`cpp -P your_cpp_file` – Vijay Oct 25 '12 at 05:54
  • You need more than cpp. See the discussion you referenced and my answer here (and there now!). – Ed Morton Oct 25 '12 at 06:29

7 Answers7

18

Here's one way using GNU sed. Run like sed -rf script.sed file.txt

Contents of script.sed:

:a
s%(.*)/\*.*\*/%\1%
ta
/\/\*/ !b
N
ba

Alternatively, here's the one liner:

sed -r ':a; s%(.*)/\*.*\*/%\1%; ta; /\/\*/ !b; N; ba' file.txt
Steve
  • 51,466
  • 13
  • 89
  • 103
  • Works for 2 or 3 lines comments. Unfortunately I have thousands of lines as comment and it does not seem to complete the job. – shantanuo Oct 25 '12 at 05:55
  • @shantanuo: It works well for me. What do you mean by 'it does not seem to complete the job'? – Steve Oct 25 '12 at 06:07
  • 1
    It consumes 99% cpu and server does not respond for a long time. I have to kill the process using Ctrl + c – shantanuo Oct 25 '12 at 06:16
  • I have the same issue as shantanuo with sed 4.2.2 – martinkunev Mar 28 '17 at 13:49
  • 1
    @martinkunev This and all of the other sed solutions will fail given various input values since, for example, it can't distinguish between `/*` as the start of a comment vs `/*` inside a comment vs `/*` inside a string, etc. And that's not even taking trigraphs into consideration :-). That;s why you need to use a tool that understands the language, like `cpp` or `gcc -E` if this is C or C++. – Ed Morton Mar 28 '17 at 13:58
  • 1
    @EdMorton Thanks, I was just looking for a quick solution and didn't actually think that sed cannot possibly work. It just passed through my mind that people usually forget about trigraphs. Now when I'm thinking about your comment it, obviously it all makes sense. I ended up using your solution :) – martinkunev Mar 28 '17 at 14:11
  • 1
    @martinkunev Somewhere online there is a **massive** sed script (I think this is the one IIRC: http://sed.sourceforge.net/grabbag/scripts/remccoms3.sed) that tries to do this job and years ago when I pointed out a case where it failed the author and I went through many iterations of my pointing out cases where it failed and him putting bandaids on the script until we both just got tired of it. The only way it'd work would be if you can write a robust C parser in sed and even **IF** that were possible I can't imagine a more pointless way to spend your time than trying to do that! – Ed Morton Mar 28 '17 at 14:15
  • Does not work when manipulating file directly with sed (sed -i), for some reason. Otherwise it works well, thanks! // sed (GNU sed) 4.4 – MetalGodwin Sep 14 '18 at 22:41
  • Causes sed to go into indefinite loop. https://github.com/jacob-carlborg/dstep/files/2527976/test.txt – Arun Oct 30 '18 at 04:29
  • @Arun: Looks like you'll need the [C locale](https://www.gnu.org/software/sed/manual/html_node/Locale-Considerations.html) for that: `LC_ALL=C sed -r '...' file`. – Steve Oct 30 '18 at 06:52
13

If this is in a C file then you MUST use a C preprocessor for this in combination with other tools to temporarily disable specific preprocessor functionality like expanding #defines or #includes, all other approaches will fail in edge cases. This will work for all cases:

[ $# -eq 2 ] && arg="$1" || arg=""
eval file="\$$#"
sed 's/a/aA/g; s/__/aB/g; s/#/aC/g' "$file" |
          gcc -P -E $arg - |
          sed 's/aC/#/g; s/aB/__/g; s/aA/a/g'

Put it in a shell script and call it with the name of the file you want parsed, optionally prefixed by a flag like "-ansi" to specify the C standard to apply.

See https://stackoverflow.com/a/35708616/1745001 for details.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
9

This should do

 sed 's|/\*|\n&|g;s|*/|&\n|g' a.txt | sed '/\/\*/,/*\//d'

For test:

a.txt

/* Line test
multi
comment */
Hello there
this would stay 
/* this would be deleteed */

Command:

$ sed 's|/\*|\n&|g;s|*/|&\n|g' a.txt | sed '/\/\*/,/*\//d'
Hello there
this would stay 
Anshu
  • 7,783
  • 5
  • 31
  • 41
3

This might work for you (GNU sed):

sed -r ':a;$!{N;ba};s|/\*[^*]*\*+([^/*][^*]*\*+)*/||' file

It's a start, anyway!

potong
  • 55,640
  • 6
  • 51
  • 83
2

To complement Ed's answer (focused on C files), I would suggest the excellent sed script remccoms3.sed by Brian Hiles for non-C files (e.g. PL/SQL file). It handles C and C++ (//) comments and correctly skips comments inside strings. The script is available here: http://sed.sourceforge.net/grabbag/scripts/remccoms3.sed

GregV
  • 2,505
  • 1
  • 14
  • 7
  • IIRC Brian and I went back and forth on that script for a while maybe about 15 years ago on usenet. I kept demonstrating cases where it fails, he kept fixing them, until I got fed up doing it as there was no end in sight. I wouldn't trust it to be robust in general (and I've no reason to use it given you can do the job concisely and robustly with gcc) but at that time we were discussing C and C++ so it may be just fine for the non-C files you mention and it's probably OK for most C and C++ files too. – Ed Morton Apr 09 '19 at 02:55
0

Try this

sed "/^\//,/\/;/d" filename
fedorqui
  • 275,237
  • 103
  • 548
  • 598
Constantine Gladky
  • 1,245
  • 6
  • 27
  • 45
0

A sed-only solution:

sed -r 's/\/\*(.*?)\*\///g' \
    | sed -r 's/(.+)(\/\*)/\1\n\2/g'\
    | sed -r 's/(\*\/)(.+)/\1\n\2/g' \
    | sed '/\/\*/,/\*\// s/.*//'

Shortcomings: multi-line comments will leave empty lines (because sed is line-based, unless you put in superhuman efforts).

Explanation

  • s/\/\*(.*?)\*\///g will take care of single-line comments.
  • s/(.+)(\/\*)/\1\n\2/g and s/(\*\/)(.+)/\1\n\2/g will split lines at the beginning and end of multi-line comments.
  • /\/\*/,/\*\// s/.*// will run the command s/.*// effectively deleting all the lines between the patterns \/\* and \*\/ - which is /* and */ escaped.
vbence
  • 20,084
  • 9
  • 69
  • 118
  • 1
    That can't detect text within strings that looks like comments and other structures. Try it with a program that contains `printf("%s\n","This will /*not */fail")` - the output speaks for itself. All other solutions that don't use `gcc` have similar problems. – Ed Morton Oct 28 '19 at 14:09