-4

I would like to remove all what's begining with <?xml version= and <gpx except the two first ones at the very begining of the myxml string. How can I do that using regex in javascript ?

myxml = "
<?xml version="1.0"?>
<gpx creator="GS_1">
<metadata>
<desc>GPX file from TRV_1_PATH_1</desc>
</metadata>
<rte>
<name>Traverse path</name>
<rtept lat="-13.701582" lon="29.043733"/>
<rtept lat="-13.702719" lon="29.043939"/>
<rtept lat="-13.704522" lon="29.043846"/>
<rtept lat="-13.704886" lon="29.043939"/>
<rtept lat="-13.705208" lon="29.043733"/>
<rtept lat="-13.705723" lon="29.043827"/>
<rtept lat="-13.705852" lon="29.04362"/>
<rtept lat="-13.706088" lon="29.043789"/>
<rtept lat="-13.70656" lon="29.043489"/>
<rtept lat="-13.707612" lon="29.043902"/>
<rtept lat="-13.708019" lon="29.043827"/>
<rtept lat="-13.708534" lon="29.044296"/>
<rtept lat="-13.709564" lon="29.044221"/>
<rtept lat="-13.710144" lon="29.04469"/>
<rtept lat="-13.71141" lon="29.045177"/>
<rtept lat="-13.712161" lon="29.04514"/>
<rtept lat="-13.712611" lon="29.045515"/>
<rtept lat="-13.713255" lon="29.045177"/>
<rtept lat="-13.714392" lon="29.044877"/>
<rtept lat="-13.714457" lon="29.044446"/>
<rtept lat="-13.715315" lon="29.044033"/>
</rte>
</gpx>
<?xml version="1.0"?>
<gpx creator="GS_4">
<metadata>
<desc>GPX file from TRV_1_PATH_2</desc>
</metadata>
<rte>
<name>Traverse path</name>
<rtept lat="-13.715379" lon="29.043996"/>
<rtept lat="-13.716795" lon="29.044465"/>
<rtept lat="-13.718061" lon="29.044202"/>
<rtept lat="-13.718662" lon="29.043902"/>
<rtept lat="-13.718619" lon="29.043433"/>
<rtept lat="-13.71922" lon="29.04347"/>
<rtept lat="-13.719907" lon="29.043001"/>
<rtept lat="-13.7204" lon="29.042213"/>
<?xml version="1.0"?>
<gpx creator="GS_1">
<metadata>
<desc>GPX file from TRV_1_PATH_3</desc>
</metadata>
<rte>
<name>Traverse path</name>
<rtept lat="-13.7204" lon="29.042138"/>
<rtept lat="-13.720615" lon="29.041407"/>
<rtept lat="-13.721237" lon="29.041144"/>
<rtept lat="-13.721838" lon="29.041275"/>
<rtept lat="-13.722396" lon="29.040994"/>
<rtept lat="-13.723104" lon="29.041613"/>
<rtept lat="-13.725228" lon="29.042945"/>
<rtept lat="-13.727052" lon="29.043977"/>
<rtept lat="-13.729327" lon="29.044521"/>
<rtept lat="-13.731387" lon="29.044352"/>
<rtept lat="-13.732653" lon="29.043414"/>
<rtept lat="-13.733554" lon="29.04197"/>
<?xml version="1.0"?>
<gpx creator="GS_7">
<metadata>
<desc>GPX file from TRV_1_PATH_4</desc>
</metadata>
<rte>
<name>Traverse path</name>
<rtept lat="-13.733683" lon="29.041913"/>
<rtept lat="-13.734305" lon="29.041763"/>
<rtept lat="-13.734434" lon="29.042026"/>
<rtept lat="-13.73394" lon="29.043076"/>
<rtept lat="-13.733554" lon="29.044202"/>
<rtept lat="-13.733447" lon="29.045252"/>
";


I tried many things with the .replace function but didn't find yet how to do that. Specially how to remove these specific n th occurences of pattern we see in the string

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
ADL92
  • 1
  • 3

1 Answers1

0

Use a positive lookbehind of the last element you want to preserve.

let combined=myxml.replaceAll(
  /(?<=gpx[\s\S]*)\s*(?:<\?xml version="1.0"\?>|<gpx [^>]+>|<\/gpx>)/g,''
) + '</gpx>';

In this case, you're deleting every match seen after the first instance of "gpx".

(?<=gpx[\s\S]*)              Match only if prior content contains 'gpx' + anything
\s*                          Match any number of whitespace characters
(?:<\?xml version="1.0"\?>   Followed by <?xml version="1.0"?>
  |<gpx [^>]+>                 or the tag <gpx ...>
  |<\/gpx>                     or the closing tag </gpx>
)
phatfingers
  • 9,770
  • 3
  • 30
  • 44
  • And if I would want to delete all the elements except the first one how I would do that? I tried something similar but it didn't work – ADL92 Feb 22 '23 at 12:55
  • If possible, you should try to solve the integration problem in a consistently valid state. Some process is glomming multiple xml documents or fragments together into one invalid document that you're fixing. Is it possible to accomplish the same thing while keeping each document intact? – phatfingers Feb 22 '23 at 15:08
  • Here's a basic pattern: `/(?<=[\s\S]*)\s*(?:|)/g` – phatfingers Feb 22 '23 at 15:16