0

I want to replace a value of an element and attribute of any XML. I have a list of replaceable elements and attributes. What is the most efficient and fastest way to do this? Using regex, .* is very expensive in performance and memory usage. Is there any way to minimize the usage of this in this requirement? So far, I have this.

For example, I have XMLs that contains the following (this sample is for explanation purposes only)

(1)

<Book>
    <author>Uncommon Passion</author>
    <title>Anne Calhoun</title>
</Book>

(2)

<Book author="Anne Calhoun">Uncommon Passion</Book>

(3)

<Article author="James Clear">Habit</Article>

Then I want to replace the value of the author element from xml 1 and at the same time the value of author attribute of xml 2 and 3. The incoming XMLs might have totally different tree-structure.

  • 4
    The safest and most standard conform way would be using an XSLT processor. And if you want to transform several elements/attributes at once, it would probably be the fastest, too. – zx485 Mar 17 '19 at 00:16
  • You should not show code for this question. Show a list of attribute value's you want to find and in what tag (element), then show what you want to replace it with. Then I'll give you the regex template. –  Mar 17 '19 at 00:19
  • 3
    @sln: Are you sure that using RegEx'es on XML files is a good idea? Because usually it's not. – zx485 Mar 17 '19 at 00:20
  • 3
    Regex is not right tool for irregular languages (more info: https://stackoverflow.com/questions/701166, https://stackoverflow.com/questions/590747, and ofc https://stackoverflow.com/a/1732454). Consider using proper XML parser like Jsoup. – Pshemo Mar 17 '19 at 00:21
  • @zx485 - `Are you sure that using RegEx'es on XML files is a good idea? Because usually it's not` Yeah, I think the first level _TAG_ parse, is not parsing SGML. Otherwise, the w3c specs wouldn't define tag parsing using pure regex. Dumb, Dummer, Dummest. All these people exist on the planet. https://regex101.com/r/4vjduH/1 –  Mar 17 '19 at 00:39
  • @GraceDePaz - I have an appointment, you might have missed your chance for a solution today. I'll be here tomorrow briefly should you decide to comment... –  Mar 17 '19 at 00:41
  • @sln What about matching patterns that occur in comments and processing instructions? What about `<Name>` instead of ``? What about parsed entities? – VGR Mar 17 '19 at 01:28
  • 2
    Do not parse XML with regex. Use a real XML parser. – kjhughes Mar 17 '19 at 01:42
  • @sln - sorry if i missed to emphasize that the solution im trying to do should cater any xml. Meaning, the tree structure might be different from one another. I updated my question. Thank you! – Grace De Paz Mar 17 '19 at 03:15
  • FYI, you can also parse and process XML using a language recognition tool such as [*ANTLR*](https://en.wikipedia.org/wiki/ANTLR). – Basil Bourque Mar 17 '19 at 04:54
  • `Then I want to replace the value of the author element from xml 1 and at the same time author attribute of xml 2` You mean the _value of the author attribute_ for number 2 I presume. –  Mar 17 '19 at 06:51
  • For number 2, the regex string and replacement string can be constructed dynamically. Using your example, it would be Find: `("']|"[^"]*"|'[^']*')*?\sauthor\s*=\s*)(?:(['"])([\S\s]*?)\2)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>)` Replace: `$1$2Peter Frampton$2$4` https://regex101.com/r/JjDWeG/1 Let me know if you want this for an answer that you accept. –  Mar 17 '19 at 07:02
  • @VGR - Unless you're looking for stuff inside _Comments or CDATA_, the idea is to _SKIP/FAIL_ these constructs. For Java, that means matching them. Ah, are you just learning xml or regex ? –  Mar 17 '19 at 07:08
  • @sln - that actually works if the element name is always Book. But, what if i have this `
    Habit
    `
    – Grace De Paz Mar 17 '19 at 09:00
  • @sln I look forward to seeing your regex for skipping comments, processing instructions, and CDATA sections. As others keep trying to tell you, it can’t be done. And I notice you didn’t cover parsed entities; perhaps it is you who haven’t finished learning XML. Even if it could be done, the regex would be a mess. Grunt coders write code that works; software engineers write code that works and is readable and maintainable. – VGR Mar 17 '19 at 13:51

0 Answers0