I am working on RSS news parser. I can get very different URLs in contents: with escaped/ not escaped or url-encoded/not url encoded hrefs:
URL-encoded:
https://www.lefigaro.fr/flash-eco/la-russie-a-gagne-93-0220613#:~:text=La%20Russie%20a%20engrang%C3%A9%2093,qui%20%C3%A9pingle%20particuli%C3%A8rement%20la%20France
Escaped:
http://mp.weixin.qq.com/s?__biz=MzI3MjE0NDA1MQ==&mid=2658568&idx=1&sn=b50084652c901&chksm=f0cb0fabcee7d4&scene=21#wechat_redirect
Not encoded & not escaped:
https://newsquawk.com/daily/article?id=2490-us-market-open-concerns&utm_source=tradingview&utm_medium=research&utm_campaign=partner-post
Additionally, RSSs initially may contain some uncoded unsafe symbols:
https://www.unsafe.com/a<b>c{d}e[f ]\g^
I need to make all the URLs formally "safe". Seems the only way to get formally safe URL is to completely unescape & decode it first?
Can I somehow normalize all the different URLs? Is there a way to get completely unescaped & decoded URL in golang?
func(url string) (completelyDecodedUrl string, error) {
// ??
}