0

I have an XML output like this (<xml> element or xlink:href attribute are just fiction and you cannot rely on them to create regex pattern.)

<xml>http://localhost:8080/def/abc/xyx</xml>
<element xlink:href="http://localhostABCDEF/def/ABC/XYZ">Some Text</element>
...

What I want to do is using Java regex to replace the domain pattern (I don't know about existing domains):

"http(s)?://.*/def/.*

with an input domain (e.g: http://google.com/def) and the result will be:

<xml>http://google.com/def/abc/xyx</xml>
<element xlink:href="http://google.com.com/def/ABC/XYZ">Some Text</element>
...

How can I do it? I think Regex in Java can do or String.replaceAll (but this one seems not possible).

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
Bằng Rikimaru
  • 1,512
  • 2
  • 24
  • 50
  • 3
    I'd use an XML parser (DOM if your file is small, event-driven otherwise) and for each element that you know will contain the URL (in your case, `xml` apparently), initialize a `URL` object with that element, get the `path` of the `URL` object, and construct again with a different domain. – Mena Feb 02 '18 at 16:33
  • Use back references in your regex "http(s)?://(.*)/def/.*" – LMC Feb 02 '18 at 16:41
  • Epic answer to a closely related question: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Andrey Tyukin Feb 02 '18 at 16:43
  • Why is in your `xml` tags path `/def/abc/xyx` lowercase and in `href` uppercase `/def/ABC/XYZ`? – Srdjan M. Feb 02 '18 at 16:53
  • @LuisMuñoz you didn't put any backreferences in what you posted, you just copied the original regex and pasted it. Am I missing something? – ctwheels Feb 02 '18 at 16:54
  • @S.Kablar that is what makes it is difficult as you don't know the parts before and after '/def'. – Bằng Rikimaru Feb 02 '18 at 16:55
  • You want to replace everything before and after `def`? – Srdjan M. Feb 02 '18 at 17:01
  • @S.Kablar I want to keep the part after 'def' and replace the domain before 'def' with input domain. – Bằng Rikimaru Feb 02 '18 at 17:04
  • Modify entire html tag or you extracted links to be edited? – Srdjan M. Feb 02 '18 at 17:13
  • @S.Kablar I don't want to extract, I want to replace link in the input XML string directly, so it is a modification. – Bằng Rikimaru Feb 02 '18 at 17:14
  • yes, you are missing the parenthesis around the host part of the pattern. In fact, this would be more correct since you don need to capture the s: `"http[s]?://(.*)/def/.*"` – LMC Feb 02 '18 at 17:15

2 Answers2

0

Regex: http[s]?:\/{2}.+\/def Substitution: http://google.com/def

Details:

  • ? Matches between zero and one times
  • [] Match a single character present in the list
  • . Matches any character
  • + Matches between one and unlimited times

Java code:

String domain = "http://google.com/def";
String html = "<xml>http://localhost:8080/def/abc/xyx</xml>\r\n<element xlink:href=\"http://localhostABCDEF/def/ABC/XYZ\">Some Text</element>";

html = html.replaceAll("http[s]?:\\/{2}.+\\/def", domain);
System.out.print(html);

Output:

<xml>http://google.com/def/abc/xyx</xml>
<element xlink:href="http://google.com/def/ABC/XYZ">Some Text</element>
Srdjan M.
  • 3,310
  • 3
  • 13
  • 34
  • I appreciate about your effort, but element or xlink:href attribute are just fiction and you cannot rely on them to create regex pattern. They could be different element names, attributes and what I want is to replace the domain name in XML string which seems XPath is possible. – Bằng Rikimaru Feb 02 '18 at 17:21
  • @Bằng Rikimaru Updated. – Srdjan M. Feb 02 '18 at 17:27
0

Actually, this could be done with Regex and it is simple enough than parsing XML document. Here is the answer:

String text = "<epsg:CommonMetaData>\n"
            + "      <epsg:type>geographic 2D</epsg:type>\n"
            + "      <epsg:informationSource>EPSG. See 3D CRS for original information source.</epsg:informationSource>\n"
            + "      <epsg:revisionDate>2007-08-27</epsg:revisionDate>\n"
            + "      <epsg:changes>\n"
            + "        <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2002.151\"/>\n"
            + "        <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2003.370\"/>\n"
            + "        <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2006.810\"/>\n"
            + "        <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2007.079\"/>\n"
            + "      </epsg:changes>\n"
            + "      <epsg:show>true</epsg:show>\n"
            + "      <epsg:isDeprecated>false</epsg:isDeprecated>\n"
            + "    </epsg:CommonMetaData>\n"
            + "  </gml:metaDataProperty>\n"
            + "  <gml:metaDataProperty>\n"
            + "    <epsg:CRSMetaData>\n"
            + "      <epsg:projectionConversion xlink:href=\"http://www.opengis.net/def/coordinateOperation/EPSG/0/15593\"/>\n"
            + "      <epsg:sourceGeographicCRS xlink:href=\"http://www.opengis.net/def/crs/EPSG/0/4979\"/>\n"
            + "    </epsg:CRSMetaData>\n"
            + "  </gml:metaDataProperty>"
            + "<gml:identifier codeSpace=\"OGP\">http://www.opengis.net/def/area/EPSG/0/1262</gml:identifier>";

    String patternString1 = "(http(s)?://.*/def/.*)";

    Pattern pattern = Pattern.compile(patternString1);
    Matcher matcher = pattern.matcher(text);

    String prefixDomain = "http://localhost:8080/def";

    StringBuffer sb = new StringBuffer();

    while (matcher.find()) {
        String url = prefixDomain + matcher.group(1).split("def")[1];
        matcher.appendReplacement(sb, url);
        System.out.println(url);
    }
    matcher.appendTail(sb);
    System.out.println(sb.toString());

which returns output https://www.diffchecker.com/CyJ8fY8p

Bằng Rikimaru
  • 1,512
  • 2
  • 24
  • 50