Using regex to edit a index.html file using a batch script

Question

Hi is it possible to edit a .html file using regex contained inside a bash scrip?

Here is what i am trying to do:

replaceText="<a href="some-file-here" id="text">link to the new file</a>"

#open index.html file stream(how?)

#do some if condition that meets the regex below:
IF index.html contains <td abbr="fileOne">(.*)</td>
Index.html replaceText

I'm quite new to bash scripting but I was wondering if the above is possible?

This has to work on both osx unix and linux.

Here is the index.html example:

<html>
<head>
</head>
<body style="width: 50%; height: 50%;">
<div style="top: 10%; left: 10%; position: absolute;">
<img border="0" src=“icon.png” alt="Hello World" width="120" height="120">
<table style="width:300px">
<tr>
<td abbr=“file one”><a href=“someFile” id="text">Install file one here…</a></td>
<td abbr=“fileTwo”><a href=“someFileTwo” id="text">install file Two here…</a></td>
<td></td>
</tr>
</table>
</div>
</body>
</html>

Thanks in advance

edit: i tried using sed command be

sed -i.bak 's/<td abbr="fileOne">(.*?)<\/td>/WHAT_YOU_WANT/' index.html

however i get the above error when i open the .bak file:

syntax error near unexpected token `newline'

do you mean batch scripting or bash scripting? you said that you wanted your solution to work on osx unix and linux but batch-files are windows only? — Mike H-R, Jul 01 '14 at 15:18
use sed, POSIX compliant sed is available on all those systems, see answer. (that is provided you want to replace the line matched with your replace_text which is what the question seems to be asking) — Mike H-R, Jul 01 '14 at 15:21
i tried using sed but it complains about the html containing newlines. When i tried to manually remove and inline the while html, it then starts complaining about the "<" characters — Jono, Jul 01 '14 at 16:15

Federico Piazza · Answer 1 · 2014-08-29T15:03:51.217

2

You can use sed command to do this.

If you want to replace <td abbr="fileOne">(.*)</td> you can use the following:

sed 's/<td abbr=[“"]fileOne["”]>(.*?)<\/td>/WHAT_YOU_WANT/'

Here you have a working example:

Working demo

You need to pass the -i option to sed to make the changes inline and create a backup of the original file before it does the changes in-place:

sed -i.bak -E 's/<td abbr=["”]fileOne["”]>(.*?)<\/td>/WHAT_YOU_WANT/' index.html

If you don't want to use slashes as separator you can change it to # (and you don't have to skip slashes too using #):

sed -i.bak -E 's#<td abbr=["”]fileTwo["”]>(.*)?</td>#WHAT_YOU_WANT#' index.html

edited Aug 29 '14 at 15:03

answered Jul 01 '14 at 15:17

Federico Piazza

30,085
15
87
123

Cheers, how does sed open and interact with the index.html file? – Jono Jul 01 '14 at 15:22
1

sed 's/(.*?)<\/td>/WHAT_YOU_WANT/' index.html > index.html – CKK Jul 01 '14 at 15:28
Hi i tried that example and it just wiped the contents of index.html – Jono Jul 01 '14 at 15:30
@jonney check again pls – Federico Piazza Jul 01 '14 at 15:40
Just did. it totaly removed the contents of the html file :( umm – Jono Jul 01 '14 at 16:03
edit: i just saw your update with the new optional arguments. trying now – Jono Jul 01 '14 at 16:04
ok i looked at the .bak file and saw this error: syntax error near unexpected token `newline' – Jono Jul 01 '14 at 16:09
@jonney the .bak file is the backup you have to check the original file. Btw, if you have a sintax error you should have a typo. That works for me – Federico Piazza Jul 01 '14 at 16:16
@jonney try a simple execution like: sed -i.bak 's//WHAT_YOU_WANT/' index.html – Federico Piazza Jul 01 '14 at 16:19
@jonney, I'm sorry if cause any inconvenience. -i.bak works pretty well for me. – CKK Jul 01 '14 at 16:22
the simple command of editing the works but not whats inside the td tag – Jono Jul 01 '14 at 16:27
@jonney I suspect of your sed version. So, try adding -E. Like: -E 's/(.*?)<\/td>/WHAT_YOU_WANT/' – Federico Piazza Jul 01 '14 at 16:57
Nope. i got this following error: sed: 1: "s/(. ...": RE error:sed: illegal option -- . usage: sed script [-Ealn] [-i extension] [file ...] sed [-Ealn] [-i extension] [-e script] ... [-f script_file] ... [file ...] – Jono Jul 02 '14 at 08:42
@jonney I've updated the answer with another expression. Both of them are working for me and also for other guys that posted comment. Check the updated answer and if it doesn't work post a comment with your execution, I think you have a typo. – Federico Piazza Jul 02 '14 at 15:01
@Fede did you notice the double quotes in the input? – Avinash Raj Aug 29 '14 at 14:02
@AvinashRaj Hi Avi... didn't see that. Thanks. I'll fix it right now – Federico Piazza Aug 29 '14 at 15:02

score 1 · Answer 2 · edited May 23 '17 at 11:49

If you want to test if a certain pattern is available in a file and then run some other script if that is the case you could test the output of grep, I'm including this answer for completeness' sake.

if [ $(grep -c '<td abbr="fileOne">(.*?)<\/td>' index.html) -ne 0 ]
then
        some_func_you_want_to_run    #this is the case where the line is present
else
        exit 1                       #this is the case where it isn't
fi

exit 0

It bears noting that regex's are not a good fix for parsing html but since I'm hoping all you are doing is replacing a single line then using sed as above would be the best way to do it. If you did have more stringent needs I'd recommend using a scripting language such as ruby, python or perl and an html parser such as nokogiri for ruby.

Using regex to edit a index.html file using a batch script

2 Answers2