I want to replace all links of a webpage to a reverse proxy domain.
The rules are
https://test.com/xxx --> https_test_com.proxy.com/xxx
http://sub.test.com/xxx --> http_sub_test_com.proxy.com/xxx
How to achieve it by regex in golang?
The type of response body is []byte
, and character encoding of it is UTF-8.
I have tried in this way. But it cannot replace all the dot to underscore in the origin domain. The length of subdomain is variable, that means the number of dot can vary
respBytes := []byte(`_.Xc=function(a){var b=window.google&&window.google.logUrl?"":"https://www.google.com";b+="/gen_204?";b+=a.j(2040-b.length);
<cite class="iUh30 Zu0yb tjvcx">https://cloud.google.com</cite></div><div class="eFM0qc"><a class="fl" href="https://webcache.googleusercontent.com/search?q=cache:80SWJ_cSDhwJ:https://cloud.google.com/+&cd=1&hl=en&ct=clnk&gl=au" ping="/url?sa=t&source=web&rct=j&url=https://webcache.googleusercontent.com/search%3Fq%3Dcache:80SWJ_cSDhwJ:https://cloud.google.com/%2B%26cd%3D1%26hl%3Den%26ct%3Dclnk%26gl%3Dau&ved=2ahUKEwia5ovYsv3xAhXS4jgGHad0BJYQIDAAegQIBRAG"><span>Cached</span></a></li><li class="action-menu-item OhScic zsYMMe" role="menuitem"><a class="fl" href="/search?q=related:https://cloud.google.com/+google+cloud&sa=X&ved=2ahUKEwia5ovYsv3xAhXS4jgGHad0BJYQHzAAegQIBRAH">
`)
proxyURI := "proxy.com"
var re = regexp.MustCompile(`(http[s]*):\/\/([a-zA-Z0-9_\-.:]*)`)
content := re.ReplaceAll(respBytes, []byte("${1}_${2}."+proxyURI))
origin | result | expect |
---|---|---|
https://www.google.com | https_www.google.com.test.com | https_www_google_com.test.com |
https://cloud.google.com | https_cloud.google.com.test.com | https_cloud_google_com.test.com |
https://https://webcache.googleusercontent.com | https_cloud.google.com.test.com | https_webcache_googleusercontent_com.test.com |