0

I'm looking for a solution to replace the newline characters found inside an XML's attributes (XML string) in Javascript. (Trying to replace with 
 so I do not lose newlines when parsing XML back)

Note: There are other newlines in the string, e.g after <main> .. </body>

(Open to any other regex/non-regex solutions!)

<main>
<head/><body><note text="Load"> 
<note text="2 Newlines at the end of this    

newline"/>
</body>
</main>

All I have is this awful thing "[\s\S]*?([\r\n]+)[\s\S]*?"

https://regex101.com/r/vE2lD7/2

JNX
  • 3
  • 3
  • `str.replace(/\n/g, " ");` – DBS Mar 26 '16 at 20:37
  • A range of possible solutions here: http://stackoverflow.com/questions/784539/how-do-i-replace-all-line-breaks-in-a-string-with-br-tags – DBS Mar 26 '16 at 20:41
  • @DBS Thanks, that won't work unfortunately as there are other newlines in the string (outside of quotes) (added detail to question) – JNX Mar 26 '16 at 20:42
  • Can't you select that text specifically, then only run the replace on it? Seems a little strange to be running it on the entire structure. Off the top of my head: `xml.getElementsByTagName('note').getAttribute('text')` – DBS Mar 26 '16 at 20:45
  • Can't access attributes directly either. As soon as I $.parseXML with newlines, they get converted to single space ref: https://stackoverflow.com/questions/2004386/how-to-save-newlines-in-xml-attribute (+ need to start with a string) – JNX Mar 26 '16 at 20:58
  • You know, if xml only allows quoted val's in attr-val pairs, it might be do'able. The problem is cdata and comments, the csv syndrome. So there is a way around it, but the complexity won't fit in your simplified example, as it requires a fairly massive regex. I'm tired of posting these because people think regex should be simple.. which is fallacious. –  Mar 26 '16 at 21:00
  • @sln If you say it looks like a massive regex, I'll try to see if I have another less painful way I can go about this. Thanks. – JNX Mar 26 '16 at 21:07
  • Hey, but you know, massive to some is more that 200 characters. Use a sax parser, you'd end up with only an additional 5 -10 k of code, but whatever.. –  Mar 26 '16 at 21:13

2 Answers2

0

Here is a little snippet making the conversion:

function convert(xmlStr) {
    var xml = document.createElement('xml');
    xml.innerHTML = xmlStr;

    [].forEach.call(xml.querySelectorAll('note[text]'), function (note) {
        note.setAttribute('text',
            note.getAttribute('text').replace(/\r?\n/g,'@#10'));
    });

    return xml.innerHTML.replace(/@#10/g, '&#10');
}

// I/O
var button = document.querySelector('button');
var input = document.querySelector('textarea');
var output = document.querySelector('pre');

// click handler
button.onclick = function () {
    output.textContent = convert(input.value);
}
textarea { width: 25em; height: 10em; float:left}
<textarea><main>
<head/><body><note text="Load"> 
<note text="2 Newlines at the end of this    

newline"/>
</body>
</main></textarea>
<button>convert</button>
<pre></pre>

Note that replacing with &#10 immediately would result the & to be escaped as &amp;.

trincot
  • 317,000
  • 35
  • 244
  • 286
0

https://regex101.com/r/vE2lD7/3

You want to alter the regex to not be greedy. Here's a nice article describing it:

https://docs.oracle.com/javase/tutorial/essential/regex/quant.html

N R
  • 9
  • 2