0

Hi I have following file

      <strong>Ramandand Sagar Krishna part 34</strong> Vasudev comes back 
and girl disappears from Kansa's hand and the first temple she instructs Devs to make at Vindhyachal <a href="http://www.dailymotion.com/embed/video/x3p3gu?
width=320&#038;theme=none&#038;wmode=transparent">http://www.dailymotion.com/embed/video/x3p3gu?width=320&#038;theme=none&#038;wmode=transparent</a> <a 
href="http://www.dailymotion.com/video/x3p3gu_krishna-part-34_shortfilms" 
target="_blank">Krishna Part 34</a> <strong>Ramandand Sagar Krishna part 35</strong> Celebrations at Yashoda's house and Vasudev Devki freed from jail <a href="http://www.dailymotion.com/embed/video/x3p3sg?width=320&#038;theme=none&#038;wmode=transparent">
http://www.dailymotion.com/embed/video/x3p3sg?width=320&#038;theme=none&#038;wmode=transparent</a> <a href="http://www.dailymotion.com/video/x3p3sg_krishna-part-35_shortfilms" target="_blank">Krishna Part 35</a> <a href="http://www.dailymotion.com/video/x66a71_krishna-143_shortfilms" target="_blank">Krishna 143</a></em></div>

In above file I want to replace

any HTML which is of following kind

<a href="http://www.dailymotion.com/embed/video/x5ftx3?width=320">http://www.dailymotion.com/embed/video/x5ftx3?width=320</a>

the keyword is any HTML tag having wmode=transparent or width=320 should be replaced with a blank space.Is there an easy way to do so?There are many HTML tags like <a href=""> </a> which do not have wmode=transparent in their lines. The file above posted is very very big approximately 30K lines are there in HTML so I have posted only relevant lines. I am on a Ubuntu system.

Registered User
  • 5,173
  • 16
  • 47
  • 73
  • There's no simple way to do this reliably with sed, because [parsing HTML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) with regex isn't a good idea. – sorpigal Feb 02 '12 at 12:28
  • If you hover over the tags you have assigned to your question, you'll notice, for the worst case example, that `search-and-replace` has 3 followers. I bet html has more followers than that. The goal of course is to get as many knowledgeable people looking at your question as possible. Good luck! – shellter Feb 02 '12 at 17:21

2 Answers2

1

As Sorpigal has pointed out, there is no simple answer to solve this. If your willing to destroy your line endings you could try my ugly concoction. It might help you:

cat file.txt | tr -d "\n" | awk '{ for (i=1; i<=NF; i++) if ($i !~ /wmode=transparent|width=320/) printf "%s ", $i} END {print ""}' file.txt | sed -e "s%<a <a%<a%g"

Output:

<strong>Ramandand Sagar Krishna part 34</strong> Vasudev comes back and girl disappears from Kansa's hand and the first temple she instructs Devs to make at Vindhyachal <a href="http://www.dailymotion.com/embed/video/x3p3gu? <a href="http://www.dailymotion.com/video/x3p3gu_krishna-part-34_shortfilms" target="_blank">Krishna Part 34</a> <strong>Ramandand Sagar Krishna part 35</strong> Celebrations at Yashoda's house and Vasudev Devki freed from jail <a href="http://www.dailymotion.com/video/x3p3sg_krishna-part-35_shortfilms" target="_blank">Krishna Part 35</a> <a href="http://www.dailymotion.com/video/x66a71_krishna-143_shortfilms" target="_blank">Krishna 143</a></em></div>

I'm sure this one-liner could be improved in some way. If you do find this useful, you may then want to split the output on a boundary to tidy things up. Sed can be good for this.

Steve
  • 51,466
  • 13
  • 89
  • 103
0

here is a link where you can found answer for your question.

in your case you have to create a script file for sed like

s/wmode=transparent//g
s/width=320//g

and running something like that:

sed -f replace_file in.txt > out.txt

i hope it's helpful for you.

have a nice day

  • This doesn't solve the problem. He wants to identify a tags that have wmode=transparent or width=320, then remove *the entire tag*, not just those parts. Since there's no guarantee each tag is on its own line `sed` is particularly inappropriate. – sorpigal Feb 02 '12 at 12:30