How to transform hyperlink codes into normal URL strings?

Question

I'm trying to build a blog system. So I need to do things like transforming '\n' into < br /> and transform http://example.com into < a href='http://example.com'>http://example.com< /a>

The former thing is easy - just using string replace() method

The latter thing is more difficult, but I found solution here: Find Hyperlinks in Text using Python (twitter related)

But now I need to implement "Edit Article" function, so I have to do the reverse action on this.

So, how can I transform < a href='http://example.com'>http://example.com< /a> into http://example.com?

Thanks! And I'm sorry for my poor English.

detect and count `<` and `>` characters and perform substring operation to that anchor link ... — Predator, Aug 06 '11 at 08:14

score 5 · Answer 1 · answered Aug 06 '11 at 08:22

5

Sounds like the wrong approach. Making round-trips work correctly is always challenging. Instead, store the source text only, and only format it as HTML when you need to display it. That way, alternate output formats / views (RSS, summaries, etc) are easier to create, too.

Separately, we wonder whether this particular wheel needs to be reinvented again ...

answered Aug 06 '11 at 08:22

tripleee

175,061
34
275
318

2

+1. Do not attempt to round-trip between plain text and HTML, that's a nightmare. I assume that the asker is doing this as a learning exercise, rather than because he needs a blog system, and reinventing wheels is often a good learning exercise. – Tom Anderson Aug 06 '11 at 08:25
You guys give me really good feedback. I shouldn't do the round-trips work. Now I know I should solve this problem in different way. Thanks! – 尤川豪 Aug 09 '11 at 11:23

score 2 · Accepted Answer · answered Aug 06 '11 at 08:12

Since you are using the answer from that other question your links will always be in the same format. So it should be pretty easy using regex. I don't know python, but going by the answer from the last question:

import re

myString = 'This is my tweet check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>'

r = re.compile(r'<a href="(http://[^ ]+)">(http://[^ ]+)</a>')
print r.sub(r'\1', myString)

Should work.

How to transform hyperlink codes into normal URL strings?

2 Answers2