Sed: Modify lines in a HTML file based on multiple pattern matches

Question

I need to figure out how to do the below and was wondering if anyone could offer any suggestions on good sed practices, or more importantly bad ones, that will assist me in completing this task.

Basically I want to go by line through a file and look for a match on PATTERN1. Once I have a match I want to look for the next line that matches PATTERN2. If I get a PATTERN2 match I want to move on to the next PATTERN1 match. If I dont have a match for PATTERN2 I want to modify the next occurrence of PATTERN3 and modify it. Finally modifying all matches for PATTERN1 at the PATTERN2 or PATTERN3 matches.

Take for example the below:

    <tr>
  <td>
<input type="text" id="record_511568" value="PATTERN1" style="width:200px">
  </td>
  <td>2001-06-29 18:38:21</td>
  <td>2014-06-29 18:38:21</td>
  <td>
    <select id="status_511568">
<option value="1">1</option>
<option value="2" selected="selected">2</option>
<option value="3">3</option>
<option value="4">4</option>
    </select>
  </td>
</tr>

I want to match PATTERN1 and check for the next occurrence of PATTERN2 (1).

Then if PATTERN2 matched I want to change it to (<option value="1" selected="selected">1</option>)

If PATTERN2 did not match then I want to make sure it matches PATTERN3 (<option value="1" selected="selected">1</option>)

Doing this progressively through each PATTERN1. Basically modifying a large HTML form to the values I pre-determine in a list. Any thoughts on some pitfalls I may run into or advice on multi-pattern matching with sed.

You really don't want to parse tagged languages like HTML or XML using SED. Even if you managed to pead back and understand what you wrote. This is a job for perl, python, or another programming language. If you must solve this problem using the oldest and clumsiest tool possible, try awk or gawk. — Captain Pedantic, Jan 05 '14 at 06:00
Obligatory answer: http://stackoverflow.com/a/1732454/1032785 — jordanm, Jan 05 '14 at 06:22
In example above, I do only see `pattern1` and no `pattern2`. This is no job for `sed`. If you have `sed` you would normaly have `awk` too. Its better for this type of logic. — Jotne, Jan 05 '14 at 10:37
Obligatory answer read, understood, and taken to heart @jordanm. I was approaching this task incorrectly from the start and I think I am going to give a go at using Javascript to make the modifications — MattSizzle, Jan 06 '14 at 19:36
The title could make people think that this is a question for a reasonable use case. To avoid this misconception, the title should make clear that you were trying to parse HTML. — oberlies, Apr 18 '14 at 15:35

score 1 · Accepted Answer · edited May 23 '17 at 12:11

1

If you want to avoid bad practices, then don't use sed for this. There's a fantastic explanation here:

RegEx match open tags except XHTML self-contained tags

Use a programming language with a proper HTML or XML parsing library.

If you don't want to follow good practices and don't mind to use some bad practices then edit your question to state that explicitly.

edited May 23 '17 at 12:11

Community

1
1

answered Jan 05 '14 at 10:11

janos

120,954
29
226
236

score 0 · Answer 2 · answered Jan 06 '14 at 11:27

0

If your HTML is valid XML, you could also try XSLT.

answered Jan 06 '14 at 11:27

Peter Faller

132
1
1
5

Sed: Modify lines in a HTML file based on multiple pattern matches

2 Answers2