-3

How can I replace some string located between the delimiters href="" ?

<td><a href="https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n" target="_blank">https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n</a></td>
    </tr>

I want to replace this:

href="https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n"

with this:

href="LINK"
alb
  • 185
  • 8
  • 1
    What code have you written in an attempt to do this? Can you share it here as a [mre], per guidance on [ask]? Where are you getting stuck? We aren't going to write your code for you, but we can assist if you can demonstrate a good-faith bare-minimum attempt at resolving this on your own before posting here. Why have you elected to use [tag:re] for this when it's widely accepted that [RegExp is not a good solution for parsing (X)HTML](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) due to the extensive edge cases you'll have to account for. – esqew Nov 29 '21 at 14:13
  • 3
    You tagged this with `re` but I think you should use a HTML Library like BeautifulSoup, rather than trying to do this from scratch. – JeffUK Nov 29 '21 at 14:13
  • https://stackoverflow.com/questions/2782097/is-there-a-built-in-package-to-parse-html-into-dom – mplungjan Nov 29 '21 at 14:14
  • I am not scraping anything from web. I am generating an email body and filling it with data from a dataframe. When I generate the HTML code with df.to_html() it is shown like this with the whole link in the mail, not being readable for the user – alb Nov 29 '21 at 14:16
  • @alb Who said you were scraping anything from the web...? "*it is shown like this with the whole link in the mail, not being readable for the user*" You haven't stated what you *expect* or desire instead, nor what code you've written (as a [mre]) to either (a) generate this e-mail in the first place, or (b) resolve the issue you describe. Please edit your question to include this and all pertinent debugging/contextual information as prescribed by [ask]. It's very difficult to answer this question as it currently stands. – esqew Nov 29 '21 at 14:18
  • And don't you mean you want to change `https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n` to `LINK` otherwise it makes no sense – mplungjan Nov 29 '21 at 14:21
  • @esqew ***I*** asked if he was scraping – mplungjan Nov 29 '21 at 14:23

1 Answers1

1

For a quick and dirty way, you could use re.sub() to match the 'href' tag and replace it with your own:

import re
html = """<td><a href="https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n" target="_blank">https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n</a></td>
    </tr>"""
re.sub('">.*<\/a>', '">LINK<\/a>" ' , html)

Output:

'<td><a href="LINK" target="_blank">https://forms.office.com/Pages/ResponsePage.aspx?id=uI1n</a></td>\n    </tr>'

But remember that parsing HTML with regular expressions is not recommended, as it can have many edge cases. I would only use this for a quick and dirty way when I absolutely know how my input HTML is structured. For a more professional approach, you should look into HTML parsers (e.g. 'beautifulsoup').