1

So i'm trying to get the XML only from this format:

--------------------------3cbec9ce8f05
Content-Disposition: form-data; name="owServerData"; filename="details.xml"
Content-Type: text/plain

<?xml version="1.0" encoding="UTF-8"?>
<Devices-Detail-Response xmlns="http://www.example.com"> 
 // Rest omitted
</Devices-Detail-Response>
------------------------------3cbec9ce8f05--

So basically everything after the first < and everything between until the last '>'.

So far i have .*<(.*)>.* which only returns <?xml version="1.0" encoding="UTF-8"?>

Thanks!

Expected result:

<?xml version="1.0" encoding="UTF-8"?>
<Devices-Detail-Response xmlns="http://www.example.com"> 
 // Rest omitted
</Devices-Detail-Response>
Allan
  • 12,117
  • 3
  • 27
  • 51
Marcel
  • 954
  • 8
  • 22
  • which language? – Allan Dec 13 '18 at 01:42
  • @Allan it's for an application in C# – Marcel Dec 13 '18 at 01:44
  • Possible duplicate of [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Mad Physicist Dec 13 '18 at 01:47
  • Ok thanks, for the ` // Rest omitted ` what do you expect as output? also there is a `>` missing after the `xmlns=http://www.example.com"` right? – Allan Dec 13 '18 at 01:47
  • @Allan yes, and i edited the post with the expected result – Marcel Dec 13 '18 at 01:52
  • Thanks I have answered the question ;-) Let me know if it works for you! Regex is not uniform and the syntax, features depend a lot on the language. – Allan Dec 13 '18 at 02:11
  • Do you know for sure that the text before and after the XML document does not contain any `<` or `>`? For example, can the `name` tag contain arbitrary text? – Eric Lippert Dec 13 '18 at 14:31
  • @EricLippert yes i know for sure, because the data comes from a embedded software, names will not change only the data – Marcel Dec 13 '18 at 22:40

2 Answers2

1

A period, by default, will not match line breaks. You would need to use the s modifier.

Also, your first dot is greedy, and it will consume all of your < up to the last one that still allows the remaining pattern to match. Instead of using a non-greedy dot .*?, I would use a negated character class to match everything except a <.

/[^<]*<(.*)>.*/s

See it on Regex101

An alternative method would be to use the \r (carriage return) and \n (newline) characters in a character class:

/[^<]*<((?:.|[\r\n])*)>.*/
K.Dᴀᴠɪs
  • 9,945
  • 11
  • 33
  • 43
  • You didn't tag a language, but JS does support the `s` flag. I'm not overly familiar with regexr.com, but your extended data [works here](https://regex101.com/r/LMI2g0/4) – K.Dᴀᴠɪs Dec 13 '18 at 01:38
  • Ah, excuse my ignorance, i thought regex was globally the same. I'm using it on a C# application – Marcel Dec 13 '18 at 01:43
  • See if the update works: `/[^<]*<((?:.|[\r\n])*)>.*/` -- [Regex101](https://regex101.com/r/LMI2g0/5). Also, if you intended to _include_ the first/last `< >` in your match, just move those characters _inside_ the capturing group: `/[^<]*(<(?:.|[\r\n])*>).*/` – K.Dᴀᴠɪs Dec 13 '18 at 01:45
  • The update returns everything after `/plain...` until `...0` – Marcel Dec 13 '18 at 01:55
  • Hmm. Try this: `^(?:[^<]|[\r\n])*(<(?:.|[\r\n])*>)(?:[\r\n]|[^>])*$` – K.Dᴀᴠɪs Dec 13 '18 at 02:03
1

You can try the following regex:

<\?xml version="1\.0" encoding="UTF-8"\?>\s*<\s*([^\s]*)(?:.|\s)*<\/\s*\1\s*>

Input:

--------------------------3cbec9ce8f05
Content-Disposition: form-data; name="owServerData"; filename="details.xml"
Content-Type: text/plain

<?xml version="1.0" encoding="UTF-8"?>
<Devices-Detail-Response xmlns="http://www.example.com"> 
 <device>a</device>
 <info>abc</info>
 <test1><u>123</u><v>456</v><z/></test1>
</Devices-Detail-Response>
------------------------------3cbec9ce8f05--

Output:

<?xml version="1.0" encoding="UTF-8"?>
<Devices-Detail-Response xmlns="http://www.example.com" 
 <device>a</device>
 <info>abc</info>
 <test1><u>123</u><v>456</v><z/></test1>
</Devices-Detail-Response>

demo: https://regex101.com/r/r6Kbh2/3/

Allan
  • 12,117
  • 3
  • 27
  • 51