-1

Hi is it possible to edit a .html file using regex contained inside a bash scrip?

Here is what i am trying to do:

replaceText="<a href="some-file-here" id="text">link to the new file</a>"

#open index.html file stream(how?)

#do some if condition that meets the regex below:
IF index.html contains <td abbr="fileOne">(.*)</td>
Index.html replaceText

I'm quite new to bash scripting but I was wondering if the above is possible?

This has to work on both osx unix and linux.

Here is the index.html example:

<html>
<head>
</head>
<body style="width: 50%; height: 50%;">
<div style="top: 10%; left: 10%; position: absolute;">
<img border="0" src=“icon.png” alt="Hello World" width="120" height="120">
<table style="width:300px">
<tr>
<td abbr=“file one”><a href=“someFile” id="text">Install file one here…</a></td>
<td abbr=“fileTwo”><a href=“someFileTwo” id="text">install file Two here…</a></td>
<td></td>
</tr>
</table>
</div>
</body>
</html>

Thanks in advance

edit: i tried using sed command be

sed -i.bak 's/<td abbr="fileOne">(.*?)<\/td>/WHAT_YOU_WANT/' index.html

however i get the above error when i open the .bak file:

syntax error near unexpected token `newline'
Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
Jono
  • 17,341
  • 48
  • 135
  • 217
  • do you mean batch scripting or bash scripting? you said that you wanted your solution to work on osx unix and linux but batch-files are windows only? – Mike H-R Jul 01 '14 at 15:18
  • Sorry bash scripting. has to work on unix/linux – Jono Jul 01 '14 at 15:19
  • 1
    use sed, POSIX compliant sed is available on all those systems, see answer. (that is provided you want to replace the line matched with your replace_text which is what the question seems to be asking) – Mike H-R Jul 01 '14 at 15:21
  • i tried using sed but it complains about the html containing newlines. When i tried to manually remove and inline the while html, it then starts complaining about the "<" characters – Jono Jul 01 '14 at 16:15

2 Answers2

2

You can use sed command to do this.

If you want to replace <td abbr="fileOne">(.*)</td> you can use the following:

sed 's/<td abbr=[“"]fileOne["”]>(.*?)<\/td>/WHAT_YOU_WANT/'

Here you have a working example:

Working demo

You need to pass the -i option to sed to make the changes inline and create a backup of the original file before it does the changes in-place:

sed -i.bak -E 's/<td abbr=["”]fileOne["”]>(.*?)<\/td>/WHAT_YOU_WANT/' index.html

If you don't want to use slashes as separator you can change it to # (and you don't have to skip slashes too using #):

sed -i.bak -E 's#<td abbr=["”]fileTwo["”]>(.*)?</td>#WHAT_YOU_WANT#' index.html
Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
1

If you want to test if a certain pattern is available in a file and then run some other script if that is the case you could test the output of grep, I'm including this answer for completeness' sake.

if [ $(grep -c '<td abbr="fileOne">(.*?)<\/td>' index.html) -ne 0 ]
then
        some_func_you_want_to_run    #this is the case where the line is present
else
        exit 1                       #this is the case where it isn't
fi

exit 0

It bears noting that regex's are not a good fix for parsing html but since I'm hoping all you are doing is replacing a single line then using sed as above would be the best way to do it. If you did have more stringent needs I'd recommend using a scripting language such as ruby, python or perl and an html parser such as nokogiri for ruby.

Community
  • 1
  • 1
Mike H-R
  • 7,726
  • 5
  • 43
  • 65