0

I have the following name spaces coming from a certain service

<soapenv:Envelope xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/ xmlns:soap=http://www.4cgroup.co.za/soapauth xmlns:gen=http://www.4cgroup.co.za/genericsoap>

Trying to parse this request I receive the following error

xml.etree.ElementTree.ParseError: not well-formed

I noticed there is no "" on namespace value. How can I add them with regular expression

Proper format

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soap="http://www.4cgroup.co.za/soapauth" xmlns:gen="http://www.4cgroup.co.za/genericsoap">

Note double quotes

Emmanuel Mtali
  • 4,383
  • 3
  • 27
  • 53
  • 2
    What is the proper format? You are more likely to get answers if you can provide as much detail as possible. How you have it written you expect us to know what a namespace for soap looks like on top of knowing regex. – Error - Syntactical Remorse Oct 16 '19 at 15:11
  • A quick hack might be to `.replace()` `' xmlns:'` with `'" xmlns:"'`, add a `'"'` at the end and delete the one after `Envelope` – Dan Oct 16 '19 at 15:14
  • Edited my question @Error-SyntacticalRemorse – Emmanuel Mtali Oct 16 '19 at 15:18

2 Answers2

1

Using regex:

import re
namespace = "<soapenv:Envelope xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/ xmlns:soap=http://www.4cgroup.co.za/soapauth xmlns:gen=http://www.4cgroup.co.za/genericsoap>"

FIND_URL = re.compile(r"((?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+)")

print(FIND_URL.sub(r'"\1"', namespace))

Output:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soap="http://www.4cgroup.co.za/soapauth" xmlns:gen="http://www.4cgroup.co.za/genericsoap">

Note that the regex isn't perfect. It works for this case but if the urls become more "unique" it may fail.

Credit to this answer

1

This regex seems to do the trick:

import re
nsmap = "<soapenv:Envelope xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/ xmlns:soap=http://www.4cgroup.co.za/soapauth xmlns:gen=http://www.4cgroup.co.za/genericsoap>"
nsmap = re.sub(r"(https?://.*?)(?=\sxmlns|>)", r'"\1"', nsmap)
print(nsmap)

Output:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soap="http://www.4cgroup.co.za/soapauth" xmlns:gen="http://www.4cgroup.co.za/genericsoap">

Check it out online here.

Ahndwoo
  • 1,025
  • 4
  • 16