1

I have following HTML string

<body>
    <img
        alt="Images may be two-dimensional, such as a photograph or screen display, or three-dimensional, such as a statue or hologram. They may be captured by optical devices – such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water."
        height="333"
        src="https://tvfcommunity-dev-ed--c.documentforce.com/servlet/rtaImage?eid=ka02v000001BOfL&amp;feoid=00N2v00000Rjh9i&amp;refid=0EM2v000002ijZG"
        width="500"
    />
    <br />
    Images may be two-<a href="https://en.wikipedia.org/wiki/Dimensional" target="_blank" title="Dimensional">dimensional</a>, such as a&nbsp;
    <a href="https://en.wikipedia.org/wiki/Photograph" target="_blank" title="Photograph">photograph</a>&nbsp;or screen display, or three-dimensional, such as a&nbsp;
    <a href="https://en.wikipedia.org/wiki/Statue" target="_blank" title="Statue">statue</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/Hologram" target="_blank" title="Hologram">hologram</a>.
</body>

I would like to change all occurrences of img src domain from tvfcommunity-dev-ed--c.documentforce.com to globalcommunity.networks.com in Python 3.x

Note: Looking for a solution that replaces the domain only if it present in img src. It should not replace if is in regular string or iframe src.

Any help?

user3164444
  • 155
  • 1
  • 7
  • 1
    Parse the XML using e.g. [`lxml`](https://lxml.de/), find and the tag you want to change and set it to whatever value you need. – alex Jul 08 '21 at 13:08

2 Answers2

2

You can resolve your situation as mentioned here:

How to use string.replace() in python 3.x

string.replace(oldvalue, newvalue)

You can use a simple string.replace to resolve your situation.

In your situation:

yourHtmlContainer = """<body><img alt="Images may be two-dimensional, such as a photograph or screen display, or three-dimensional, such as a statue or hologram. They may be captured by optical devices – such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water." height="333" src="https://tvfcommunity-dev-ed--c.documentforce.com/servlet/rtaImage?eid=ka02v000001BOfL&amp;feoid=00N2v00000Rjh9i&amp;refid=0EM2v000002ijZG" width="500"><br>Images may be two-<a href="https://en.wikipedia.org/wiki/Dimensional" target="_blank" title="Dimensional">dimensional</a>, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Photograph" target="_blank" title="Photograph">photograph</a>&nbsp;or screen display, or three-dimensional, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Statue" target="_blank" title="Statue">statue</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/Hologram" target="_blank" title="Hologram">hologram</a>. </body>"""
print("Before replace")
print(yourHtmlContainer)


newHtml = yourHtmlContainer.replace("tvfcommunity-dev-ed--c.documentforce.com", "globalcommunity.networks.com")
print("After replace")
print(newHtml)

Output:

Before replace
<body><img alt="Images may be two-dimensional, such as a photograph or screen display, or three-dimensional, such as a statue or hologram. They may be captured by optical devices – such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water." height="333" src="https://tvfcommunity-dev-ed--c.documentforce.com/servlet/rtaImage?eid=ka02v000001BOfL&amp;feoid=00N2v00000Rjh9i&amp;refid=0EM2v000002ijZG" width="500"><br>Images may be two-<a href="https://en.wikipedia.org/wiki/Dimensional" target="_blank" title="Dimensional">dimensional</a>, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Photograph" target="_blank" title="Photograph">photograph</a>&nbsp;or screen display, or three-dimensional, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Statue" target="_blank" title="Statue">statue</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/Hologram" target="_blank" title="Hologram">hologram</a>. </body>
After replace
<body><img alt="Images may be two-dimensional, such as a photograph or screen display, or three-dimensional, such as a statue or hologram. They may be captured by optical devices – such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water." height="333" src="https://globalcommunity.networks.com/servlet/rtaImage?eid=ka02v000001BOfL&amp;feoid=00N2v00000Rjh9i&amp;refid=0EM2v000002ijZG" width="500"><br>Images may be two-<a href="https://en.wikipedia.org/wiki/Dimensional" target="_blank" title="Dimensional">dimensional</a>, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Photograph" target="_blank" title="Photograph">photograph</a>&nbsp;or screen display, or three-dimensional, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Statue" target="_blank" title="Statue">statue</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/Hologram" target="_blank" title="Hologram">hologram</a>. </body>

For more help: https://www.w3schools.com/python/ref_string_replace.asp

  • Hi Mateus, thank you for response. your answer even replaces a regular string (or iframe src) that contains tvfcommunity-dev-ed--c.documentforce.com. I am looking for a solution to replace only if it present in img src. – user3164444 Jul 08 '21 at 12:56
  • 1
    Could you iterate your HTML tags? If the answer is yes, you can do some checking to see if the changed tag is an img. I can change the code if that resolves your situation. [python-looping-through-html](https://stackoverflow.com/questions/31729045/python-looping-through-html-tags-and-using-if) – Mateus Felipe Jul 08 '21 at 13:28
  • HI Mateus, may be that works. Instead of checking afterwords, if i can check and replace then it might help. – user3164444 Jul 12 '21 at 09:37
1

Thank you All for your valuable inputs.

I have solved this issue by using BeautifulSoup

from bs4 import BeautifulSoup

html_doc = '<body><img alt="Images may be two-dimensional, such as a photograph or screen display, or three-dimensional, such as a statue or hologram. They may be captured by optical devices – such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water." height="333" src="https://tvfcommunity-dev-ed--c.documentforce.com/servlet/rtaImage?eid=ka02v000001BOfL&amp;feoid=00N2v00000Rjh9i&amp;refid=0EM2v000002ijZG" width="500"/><br /> Images may be two-<a href="https://en.wikipedia.org/wiki/Dimensional" target="_blank" title="Dimensional">dimensional</a>, such as a&nbsp; <a href="https://en.wikipedia.org/wiki/Photograph" target="_blank" title="Photograph">photograph</a>&nbsp;or screen display, or three-dimensional, such as a&nbsp; <a href="https://en.wikipedia.org/wiki/Statue" target="_blank" title="Statue">statue</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/Hologram" target="_blank" title="Hologram">hologram</a>. </body>'
modified_data = BeautifulSoup(html_doc, 'html.parser')

# Find image and change src domain
for tag in modified_data.findAll("img"): 
  tag['src'] = tag['src'].replace('https://tvfcommunity-dev-ed--c.documentforce.com/', 'https://globalcommunity.networks.com/')
print(modified_data)
user3164444
  • 155
  • 1
  • 7