0

Hi I am trying to write a script to parse some html files to make a job a bit easier, but I'm having no luck, I've tried reading other threads and manuals to no avail. I seem to get stuck with circular brackets.

I want to replace all appearances of:

$FORMTOP("2")$ with $FORMTOP("3")$

$WHITE*("5")$ with $WHITE*("10")$

</b> with </strong>

<tr><td with <tr> newline, tab <td

delete occurrences of <td></td>

Adam Rackis
  • 82,527
  • 56
  • 270
  • 393
Aydin Hassan
  • 1,465
  • 2
  • 20
  • 41
  • 1
    Your question is unclear. Do you want to delete all occurrences of with nothing in between? Do you want to delete everything in a .* construction, but only when it appears after the 3 specific lines in the example? What is the purpose of the 4 line block of code (7 with blank lines) in the question? – William Pursell Nov 18 '11 at 14:17
  • Yes I want to delete all occurrences of with nothing in between. I used code because i couldn't display arrows properly. Every occurence of – Aydin Hassan Nov 18 '11 at 14:19

2 Answers2

1

In sed you will have to put a new line (put a "\" and hit enter) and tab spaces (press spacebar 8 times) manually in the replacement section.

[jaypal@MBP-13~/temp] sed 's/<tr><td/<tr>\
        <td/g' test123
<tr>
        <td 

<tr>
        <td 
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
0

I can't say for certain that this will work on Solaris, as I don't have it available anymore, but I'm using Sun-Solaris std sed commands with nothing fancy, I think this should work.

{
cat <<-EOS
\$FORMTOP("2")$
\$WHITE*("5")$
</b>
<tr><td
EOS
} |sed '
s/\$FORMTOP("2")\$/\$FORMTOP("3")\$/g
s/\$WHITE\*("5")\$/\$WHITE\*("10")\$/g
s/<\/b>/\<\/strong>/g
/<tr><td/{
  s/<td//
  a\
    <td

}
'

#output 
$FORMTOP("3")$
$WHITE*("10")$
</strong>
<tr>
        <td

For this testing harness, using { cat <<-EOS ... EOS }, I had to escape the '$' that where being interpreted as env vars by the shell. If you put the test data in a file, be sure to remove the '\'s in front of the '$'s.

EDIT Also, stuff that looks indented in sed, is indented with spaces except for the char just before your final <td.

Also, as you wrote 'I've tried reading other threads',you did find the S.O. number one post concerning fixing XML with sed, right?

I hope this helps.

Community
  • 1
  • 1
shellter
  • 36,525
  • 7
  • 83
  • 90
  • Thanks for that, everything works apart from the tab appearing after, it just puts the – Aydin Hassan Nov 18 '11 at 15:14
  • Yes, the S.O. only displays spaces, I had a tab char in there, if you edit the code and put a real tab, it will work. Can you please accept the answer? http://i.imgur.com/uqJeW.png . Good luck. – shellter Nov 18 '11 at 15:17
  • hm... so the line in question should look like $sp$sp$tab – shellter Nov 18 '11 at 15:39