0

I want to remove script calls from the HTML with following script.

var=$(sed  -e '/^<script.*</script>$/d' -e '/.js/!d' testFile.html)

sed -i -e "/$var/d" testFile.html 

Sample input file:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>
<script type="text/javascript" src="script.js" language="javascript">
</script>

<script>
// script code
</script>
</head>
<body>

</body>
</html>

Sample output file:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>

</script>

<script>
// script code
</script>
</head>
<body>

</body>
</html>

But, it gives the following error..

sed: -e expression #1, char 23: unterminated `s' command

Thanks in advance

hunlu
  • 1
  • 2

2 Answers2

0

trying

root@isadora:~/temp# sed -e '/^<script/,/<\/script>/d' aaaa.html 
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>JavaScript</title>

</script>

</head>
<body>

</body>
</html>
root@isadora:~/temp# 

Att.

0

It is unclear why you break this up into two separate scripts or what you hope for the variable to contain. This can be performed trivially with a single script.

The immediate problem is that you cannot use a literal unescaped slash in a regex if you use slash as the regex separator. Either use a different separator, or backslash-escape any literal slashes.

sed -i -e '\#^<script.*</script>$#d' -e '/\.js/!d' testFile.html

Notice also the backslash before the dot (an unescaped dot in a regex matches any character, so /.js/ matches e.g. the string notjs.)

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thank you very much. Actually, I want to remove only that script calls. In this command it removes other things apart from the script call. vice versa. How can I do that? – hunlu Jul 16 '18 at 13:37
  • It's not really clear what exactly the regexes should match but removing `-e '/\.js/d'` will apparently produce *almost* the expected output. – tripleee Jul 16 '18 at 13:41
  • Perhaps you are really looking for `s###g'` actually...? – tripleee Jul 16 '18 at 13:44
  • ... Though fundamentally, an XML-aware tool like `xmlstarlet` would be a lot better than `sed` for this. – tripleee Jul 16 '18 at 13:46
  • Your sed command works fine but let me give my expected output file content: JavaScript – hunlu Jul 16 '18 at 13:50
  • The output file content of your command is: – hunlu Jul 16 '18 at 13:50
  • Your question already contains the expected output; puttingtit in a comment as well doesn't add anything. Clarifying what *exactly* you want to change woull be more helpful. I notice that the `` tag is on a separate line, so a pure `sed` script is probably not going to be a good solution. This is why I recommend an XPath tool instead. You might also want to google ["html cthulhu the-center-cannot-hold"](https://google.com/search?q=html+cthulhu+the-center-cannot-hold) – tripleee Jul 16 '18 at 14:20
  • Thank you very much! I will have a look about that – hunlu Jul 16 '18 at 14:32