Compare files and comment the same lines in new file

Question

Goal: I want compare two Suricata rule files and comment out the same lines (alerts "SIDs") from file1 in file2 unless it already commented out. I understand there is better way to do this with the Suricata threshold file but I unfortunately don't have that luxury beyond what I can explain here. This is to facilitate updating the rules where the rule may get updated but the commonality the "SID" will be the same across both files.

I'm not sure where to start.

Sample file1 text:

alert $home_net any > $External_net any (msg: example; content: something; sid: 12345; rev:1)
#alert $home_net any > $External_net any (msg: example; content: something; sid: 67895; rev:1)
alert $home_net any > $External_net any (msg: example; content: something; sid: 18975; rev:1)

Sample file2 text:

alert $home_net any > $External_net any (msg: example; content: something; sid: 12345; rev:1)
<insert #>alert $home_net any > $External_net any (msg: example; content: something; sid: 67895; rev:1)
alert $home_net any > $External_net any (msg: example; content: something; sid: 18975; rev:1)

Edit: Provided solution works with initial sample data I provided above however, it doesn't work with actual signatures. So I'm providing actual signatures below. Also rules may or may not have white-space between each line.

Sample file1 text:

#alert tcp $EXTERNAL_NET any -> $HOME_NET 2200 (msg:"ET EXPLOIT CA BrightStor ARCserve Mobile Backup LGSERVER.EXE Heap Corruption"; flow:established,to_server; content:"|4e 3d 2c 1b|"; depth:4; isdataat:2891,relative; reference:cve,2007-0449; reference:url,doc.emergingthreats.net/bin/view/Main/2003369; classtype:attempted-admin; sid:2003369; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert udp $EXTERNAL_NET any -> $HOME_NET 111 (msg:"ET EXPLOIT Computer Associates Brightstor ARCServer Backup RPC Server (Catirpc.dll) DoS"; content:"|00 00 00 00|"; offset:4; depth:4; content:"|00 00 00 03|"; distance:8; within:4; content:"|00 00 00 08|"; distance:0; within:4; content:"|00 00 00 00|"; distance:0; within:4; content:"|00 00 00 00|"; distance:4; within:4; content:"|00 00 00 00 00 00 00 00|"; distance:8; within:32; reference:url,www.milw0rm.com/exploits/3248; reference:url,doc.emergingthreats.net/bin/view/Main/2003370; classtype:attempted-dos; sid:2003370; rev:3; metadata:created_at 2010_07_30, updated_at 2020_08_20;)

#alert tcp $EXTERNAL_NET any -> $HOME_NET 1900 (msg:"ET EXPLOIT Computer Associates Mobile Backup Service LGSERVER.EXE Stack Overflow"; flow:established,to_server; content:"0000033000"; depth:10; isdataat:1000,relative; reference:url,www.milw0rm.com/exploits/3244; reference:url,doc.emergingthreats.net/bin/view/Main/2003378; classtype:attempted-admin; sid:2003378; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

Sample file2 text:

#alert tcp $EXTERNAL_NET any -> $HOME_NET 2200 (msg:"ET EXPLOIT CA BrightStor ARCserve Mobile Backup LGSERVER.EXE Heap Corruption"; flow:established,to_server; content:"|4e 3d 2c 1b|"; depth:4; isdataat:2891,relative; reference:cve,2007-0449; reference:url,doc.emergingthreats.net/bin/view/Main/2003369; classtype:attempted-admin; sid:2003369; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)
alert udp $EXTERNAL_NET any -> $HOME_NET 111 (msg:"ET EXPLOIT Computer Associates Brightstor ARCServer Backup RPC Server (Catirpc.dll) DoS"; content:"|00 00 00 00|"; offset:4; depth:4; content:"|00 00 00 03|"; distance:8; within:4; content:"|00 00 00 08|"; distance:0; within:4; content:"|00 00 00 00|"; distance:0; within:4; content:"|00 00 00 00|"; distance:4; within:4; content:"|00 00 00 00 00 00 00 00|"; distance:8; within:32; reference:url,www.milw0rm.com/exploits/3248; reference:url,doc.emergingthreats.net/bin/view/Main/2003370; classtype:attempted-dos; sid:2003370; rev:3; metadata:created_at 2010_07_30, updated_at 2020_08_20;)
< insert #>alert tcp $EXTERNAL_NET any -> $HOME_NET 1900 (msg:"ET EXPLOIT Computer Associates Mobile Backup Service LGSERVER.EXE Stack Overflow"; flow:established,to_server; content:"0000033000"; depth:10; isdataat:1000,relative; reference:url,www.milw0rm.com/exploits/3244; reference:url,doc.emergingthreats.net/bin/view/Main/2003378; classtype:attempted-admin; sid:2003378; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

Looks to me like you might need a Perl or Python scripts that can parse the SID out of commented out rules, then comment out those rules with the same SID in another file. Sorry, I don't have any one-liner type of shell ideas for you. — Jason, May 08 '21 at 22:10
@Jason Don't underestimate the power of standard Unix tools :-) If this couldn't be done with sed, awk would have been a good substitute. No need to resort to higher-level languages. — xhienne, May 08 '21 at 22:46
Point taken. I'd find it easier to do in Python, but I wrote a whole rule management tool in Python. But always impressed by some sed mastery! — Jason, May 09 '21 at 02:05
From what I understand, you could probably use `suricata-update` tool to remove duplicated rules and only produce one file `suricata.rules` to be used globally. This tool is clever enough to manage rules. You don't have to do all of this. — MaXi32, Oct 08 '21 at 06:11

xhienne · Accepted Answer · 2021-05-09T23:10:17.970

0

First, examine the first file and find out what sids are commented out:

sed -En '/^#/ s/.*sid:([0-9]+).*/\1/p' file1

The above command prints out the sid of the lines that begin with a #, one sid per line. Now let's aggregate those lines and build a list of sids separated with |:

sed -En '/^#/ s/.*sid:([0-9]+).*/\1/p' file1 | paste -sd '|'

Fine, now we have sid1|sid2|...|sidN. As it is written, this can be used as a regex to identify the lines in file2 that need to be commented out. Let's put this regex in a variable:

sid_regex=$(sed -En '/^#/ s/.*sid:([0-9]+).*/\1/p' file1 | paste -sd '|')

Now, we can modify file2 so that every line 1) with a sid that matches the regex and 2) that doesn't already begin with # is commented out:

sed -E "/sid:($sid_regex);/ s/^[^#]/#&/" file2 > file2.new

Voilà! To sum it up:

$ sid_regex=$(sed -En '/^#/ s/.*sid:([0-9]+).*/\1/p' file1 | paste -sd '|')
$ sed -E "/sid:($sid_regex);/ s/^[^#]/#&/" file2 > file2.new

[update] You have so many commented lines that the resulting huge regex makes the command too big ("Argument list too long"). Let us try another approach: instead of building a one-line sed program with a gigantic regex, we will build a multi-line sed program, with one line for each sid.

This first sed command generates the second sed program:

sed -En '/^#/ s|.*(sid:[0-9]+;).*|/\1/ s/^[^#]/#\&/|p' file1

The result should be something like:

/sid:111;/ s/^[^#]/#&/
/sid:222;/ s/^[^#]/#&/
...
/sid:123456;/ s/^[^#]/#&/

Now we feed a second sed with that program in order to process file2:

sed -En '/^#/ s|.*(sid:[0-9]+;).*|/\1/ s/^[^#]/#\&/|p' file1 | sed -f - file2 > file2.new

edited May 09 '21 at 23:10

answered May 08 '21 at 22:33

xhienne

5,738
1
15
34

I think I made an error last time. I copy and pasted again and now it outputs the file2.new but not with the updated comments. – grizzly May 08 '21 at 23:27
OK, I figured it out it was an error my end. Thanks again this saved me a lot of time. – grizzly May 08 '21 at 23:36
it seems I was a little premature in marking the solution correct. It works with my sample data however, when I use real world signatures it doesn't seem to work. I'll update the post to include real sample data. – grizzly May 09 '21 at 14:29
So...I made a mistake again, in the real signature there is no space between "sid:" and the sid number. After I corrected that it works. However, I do get an error, that "Argument list too long" apparently sed can't handle all the commented rules...so I may be out of luck. Unless there is a way around this? – grizzly May 09 '21 at 15:22
Would putting the results from from first sed command into a separate file help then processing that with sed? – grizzly May 09 '21 at 15:32
@grizzly This is not a bad idea but, even if we managed to put the final sed command in a file, I'm afraid the huge regex you get may break sed too (sed program lines are probably limited in length). I have taken another path, see the last part of my answer. I also took your other comment into account (no space after `sid:`). – xhienne May 09 '21 at 23:07

Compare files and comment the same lines in new file

1 Answers1