An example describes it better. Suppose you have a structure like this:
<h1>TITLE OF HEAD 1</h1>
<table>
<tbody>
<tr>
<td class="one">ITEM 1, AFTER HEAD 1</td>
</tr>
<tr>
<td class="one">ITEM 2, AFTER HEAD 1</td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td class="one">ITEM 3, AFTER HEAD 1</td>
</tr>
<tr>
<td class="one">ITEM 4, AFTER HEAD 1</td>
</tr>
<tr>
<td class="one">ITEM 5, AFTER HEAD 1</td>
</tr>
</tbody>
</table>
<h1>TITLE OF HEAD 2</h1>
<table>
<tbody>
<tr>
<td class="one">ITEM 6, AFTER HEAD 2</td>
</tr>
</tbody>
</table>
<h1>TITLE OF HEAD 3</h1>
<table>
<tbody>
<tr>
<td class="one">ITEM 7, AFTER HEAD 3</td>
</tr>
<tr>
<td class="one">ITEM 8, AFTER HEAD 3</td>
</tr>
<tr>
<td class="one">ITEM 9, AFTER HEAD 3</td>
</tr>
<tr>
<td class="one">ITEM 10, AFTER HEAD 3</td>
</tr>
</tbody>
</table>
<h1>TITLE OF HEAD 4</h1>
<table>
<tbody>
<tr>
<td class="one">ITEM 11, AFTER HEAD 4</td>
</tr>
<tr>
<td class="one">ITEM 12, AFTER HEAD 4</td>
</tr>
</tbody>
</table>
And with regex, the outcome should be:
<table>
<tbody>
<tr>
<td class="one">ITEM 1, AFTER HEAD 1</td>
<td class="two">TITLE OF HEAD 1</td>
</tr>
<tr>
<td class="one">ITEM 2, AFTER HEAD 1</td>
<td class="two">TITLE OF HEAD 1</td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td class="one">ITEM 3, AFTER HEAD 1</td>
<td class="two">TITLE OF HEAD 1</td>
</tr>
<tr>
<td class="one">ITEM 4, AFTER HEAD 1</td>
<td class="two">TITLE OF HEAD 1</td>
</tr>
<tr>
<td class="one">ITEM 5, AFTER HEAD 1</td>
<td class="two">TITLE OF HEAD 1</td>
</tr>
</tbody>
</table>
<h1>TITLE OF HEAD 2</h1>
<table>
<tbody>
<tr>
<td class="one">ITEM 6, AFTER HEAD 2</td>
<td class="two">TITLE OF HEAD 2</td>
</tr>
</tbody>
</table>
<h1>TITLE OF HEAD 3</h1>
<table>
<tbody>
<tr>
<td class="one">ITEM 7, AFTER HEAD 3</td>
<td class="two">TITLE OF HEAD 3</td>
</tr>
<tr>
<td class="one">ITEM 8, AFTER HEAD 3</td>
<td class="two">TITLE OF HEAD 3</td>
</tr>
<tr>
<td class="one">ITEM 9, AFTER HEAD 3</td>
<td class="two">TITLE OF HEAD 3</td>
</tr>
<tr>
<td class="one">ITEM 10, AFTER HEAD 3</td>
<td class="two">TITLE OF HEAD 3</td>
</tr>
</tbody>
</table>
<h1>TITLE OF HEAD 4</h1>
<table>
<tbody>
<tr>
<td class="one">ITEM 11, AFTER HEAD 4</td>
<td class="two">TITLE OF HEAD 4</td>
</tr>
<tr>
<td class="one">ITEM 12, AFTER HEAD 4</td>
<td class="two">TITLE OF HEAD 4</td>
</tr>
</tbody>
</table>
What I've tried so far:
Now getting the strings inside the <h1>
is easy:
find: (<h1>)(.*?)(</h1>)
replace: $2
Then I tried:
find: (<h1>)(.*?)(</h1>)(\n|.)*?(<td class="one">.*?</td>)
replace: $5<td class="two">$2</td>
which works, but the other tags are removed as well, so I've modified it:
find (<h1>)(.*?)(</h1>)((\n|.)*?)(<td class="one">.*?</td>)
replace: $4$6<td class="two">$2</td>
Each string of a new h1
will be used for the tds
that occur afterwards until a new h1
occurs, which will then be used - the problem is this only works for each first td
after each h1
, not all tds
.
Could somebody tell me what needs to be added to the regex for this to work?
Thank you!