1

This code worked for a couple of weeks, returning an error now. Any suggestions on what migth have happened? It seems the site I'm parsing made minor changes causing my parser to choke on it ...


Imagine the following string in file test.html (without dots)

... </script> <script type="text/javascript" src=" ...

Desired string transformation

Replace <script type="text/javascript" with <tagkilled

With the following php code

    $file = "test.html";

    // Destroy javascript codetag
    $command='/bin/sed -ri \'s/<script type="text\/javascript"/<tagkilled/g\' '.str_replace(' ','\ ',$file);
    exec($command);

Returned error message

/bin/sed: -e expression #1, char 34: Invalid preceding regular expression

Sidenotes: Running sed 4.2.1 Dec. 2010 on Ubuntu 12.10.

somethis
  • 227
  • 3
  • 11
  • You'll always be susceptible to changes made in the HTML of a remote site. HTML is not something that you can use Regex on ([see here](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) ). For a more robust solution read up on PHP's DOMDocument class. –  Jul 14 '13 at 09:19
  • The code above prepares the document for parsing via a DOM parser. In order to get into the scripted segments of the page I need to remove the script tags though. That's exactly what the code does. – somethis Jul 14 '13 at 10:23

1 Answers1

3

This works here with GNU , try to replace the slash in sed's s command with another character: s#search#replace#[flags]:

$ cat file
</script> <script type="text/javascript" src="

$ sed 's#<script type="text/javascript"#<tagkilled#g' file
</script> <tagkilled src="


Where g is not needed here.


You should not treat , , and similar file format with Regex. Use a tool.

captcha
  • 3,756
  • 12
  • 21
  • Also, doesn't PHP have some sort of sub/repl utility of it's own? There is a question every day about using `sed` from `php`. What's up with that!? ; -) Good luck to all. – shellter Jul 14 '13 at 16:52