-1

I need to create a script that processes a couple of html files to convert these type of lines:

<link rel="stylesheet" href="assets/css/main.css">

.. into this:

{stylesheets file='assets/css/main.css'}
        <link rel="stylesheet" href="{$asset_url}">
{/stylesheets}

I also need to convert javascript codes inside of the html files in the same way. This:

<script type="text/javascript" src="vendor/revolution/revolution.extension.migration.min.js"></script>

.. into this:

{javascripts file='vendor/revolution/revolution.extension.migration.min.js'}
    <script src="{$asset_url}"></script>
{/javascripts}

I know some of basic tricks in bash with regex, e.g. search and replace, but I do not have enough knowledge to do this alone.

I would appreciate your help a lot.

Thank you in advance.

EDIT :

link and script are always located on one line, but their arguments aren't always in the same order.

If regex seems to be a bad idea for html, which tool do you suggest to me?

oguz ismail
  • 1
  • 16
  • 47
  • 69
  • Raise separate questions for two different requests. Thanks. – Raja G Sep 18 '18 at 09:37
  • 1
    Are you sure the `link` and `script` are always located on one line? Are you sure their arguments are always in the same order? – choroba Sep 18 '18 at 09:40
  • 3
    obligatory [don't parse HTML with regex](https://stackoverflow.com/q/1732348/7552) link – glenn jackman Sep 18 '18 at 10:26
  • Thank you all for your responses. As suggested by Robert, I edited my question to make the details you suggest to me. @Robert : Of course, but as mentioned in my message, I only have bits of code that and I don't know how to assemble to make this script. That's why I ask experienced people. – consumptivas Sep 18 '18 at 12:49

1 Answers1

0

Parsing HTML with regex is not a good idea, but this ugly and very-likely-to-fail sed command may have a chance

sed \
-e 's/\(<link rel="stylesheet" href="\)\([^"]\+\)\(">\)/\n{stylesheets file='\''\2'\''}\n\t\1${asset_url}\3\n{\/stylesheets}\n/g' \
-e 's/\(<script \)type="text\/javascript" \(src="\)\([^"]\+\)\("><\/script>\)/\n{javascripts file='\''\3'\''}\n\t\1\2{$asset_url}\4\n{\/javascripts}\n/g' \
<input_file>
oguz ismail
  • 1
  • 16
  • 47
  • 69
  • Waouh, works as expected ! Many Thanks ! But How to improve this code, for exemple if arguments aren't on the same order ? Is there any better way ? Using perl or other language/command ? – consumptivas Sep 18 '18 at 13:03
  • Neither sed nor other similar tools are suitable for such a complicated task. You better use a powerful language like python with an HTML parser – oguz ismail Sep 18 '18 at 13:23
  • I just read on the HTML parser and discovered beautifulsoup. Do you know this tool? What do you recommend? I am looking for a topic on stackoverflow that is close to my request. – consumptivas Sep 18 '18 at 13:34
  • No I don't know, and since I have almost no experience with HTML parsing I can't recommend a specific library – oguz ismail Sep 18 '18 at 13:38