0

Extract "value=" only from non-comment portion

See the below sed expression which gets value from commented code as well

I tried with grep but that doesn't work also

#!/bin/sh
#set -x

FILE="/tmp/comment.txt"
create_file () {
echo "/*" > $FILE
echo "this is a multi" >> $FILE
echo "line with" >> $FILE
echo "var=20" >> $FILE
echo "and ending the comment */" >> $FILE
echo "var=15" >> $FILE # line after comment
}

create_file
cat $FILE
# This sed should extract only from var=15 which is not part of
# comments, how to do that?
# output should be only 15, instead of "20 and 15"
sed -n "s/\(var=\)\([0-9]*\)/\2/p" $FILE

Actual:

/*
this is a multi
line with
var=20
and ending the comment */
var=15
20
15

Expected:

/*
this is a multi
line with
var=20
and ending the comment */
var=15
15
satya
  • 101
  • 1
  • 8
  • btw, the ending comment "*/" can also be on new line – satya Jun 30 '19 at 15:25
  • 1
    It's not possible in general without a language parser. What you've provided as an example would be trivial to handle but, for example, how can you handle cases where `/*` or `*/` appear within a string, or `\*/` appears within a comment without a parser for the language you're trying to ignore the commented sections of? If your commented code is C is C++ then see https://stackoverflow.com/a/13062682/1745001 for how to remove comments from it. – Ed Morton Jun 30 '19 at 17:40
  • @EdMorton Yes, for complex cases it becomes tough without parser and also messy with awk or sed. That link you provided is helpful! thank you – satya Jul 01 '19 at 04:55

2 Answers2

1

This seems to work:

sed -n -e:a -e'/\*\//d;/\/\*/{N;ba
};s/^var=//p'

The easy part is extracting the value from the line; the hard part is removing the comment first. Rough translation: if there's a */ then delete everything; otherwise if there's a /* then read the next line as well and start over; otherwise if the line starts with "var=" then delete that part and print the rest.

Note 1: that annoying line break may not be needed in your version of sed.
Note 2: I advise you to test this on the command line, before you attempt it from within a script.

Beta
  • 96,650
  • 16
  • 149
  • 150
  • won't this break on a line like `/* ... */ var=val` ? – jhnc Jun 30 '19 at 18:45
  • @Beta, Thanks it worked great! (No issue with the line break) – satya Jul 01 '19 at 04:44
  • @jhnc I tested with extra single line comment /* var=30 */ and it works fine – satya Jul 01 '19 at 04:45
  • @jhnc: Yes, it would. I made certain assumptions, including one that the pattern in the example text that each `var=...` has a line to itself, would hold in the real text. To make a really rigorous solution, I'd've had to ask about a lot of corner cases. – Beta Jul 01 '19 at 18:21
0

This is the cheap and cheerful way to remove comments as you've shown using GNU awk for multi-char RS:

$ awk -v RS='[*]/' -v ORS= '{sub("/[*].*","")}1' file

var=15

It'll strip the comments no matter where they start/stop on each line:

$ cat file
here's some text /* here's a comment */ and more text /* bleh */and more /*
this is a multi
line with
ending here */ and more
var=20/*
and ending the comment */
/* commented */ var=15

$ awk -v RS='[*]/' -v ORS= '{sub("/[*].*","")} 1' file
here's some text  and more text and more  and more
var=20
 var=15

It just can't identify strings that look like comment starts/stops inside strings or other language-specific constructs.

You can pipe that to whatever you like to get the value of var. If that's not all you need then get/use a parser for whatever language your commented code is written in, e.g. see https://stackoverflow.com/a/13062682/1745001 for C/C++.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thank you for the awk version. It works for the sample provided. Since I have already used sed version, am gonna choose that as answer – satya Jul 01 '19 at 04:53
  • That's fine but the sed version won't work for even just the sample input I used in my answer. – Ed Morton Jul 01 '19 at 12:03