Extracting multiple substrings from a string using python

Question

I have an xml string (which I am fetching from a CSV) here from which I need to identify the values and names from it. The logic I thought is to extract all strings which are in between of > and </ for values and then remove the extracted substring which contains < or >

I am yet to think on how I can extract parameter name as well. I am using python.

Below is the xml string that I read from csv file.

<ServiceInstanceUpdateAck xmlns='http://www.loremipsum.com/'>
<Context>
    <Attribute name='Atri1'>A1</Attribute>
    <Attribute name='Atri2'>A2</Attribute>
    <Attribute name='Atri3'>A3</Attribute>
</Context>
<Operation>DEPROVISION</Operation>
<Status>SUCCESS</Status>
<Error message='No Error' majorErrorCode='E_NONE' minorErrorCode='E_NONE'/> 
<Service>
  <Id>11233334</Id>
</Service> 
</ServiceInstanceUpdateAck>

Any help is appreciated.

Note: This is not a whole xml. It is a post body data that I am fetching from a CSV file. It is a string basically when I fetch it. So I cannot use XML parser here.

This is not a whole xml. This is a post body data of a request that I am fetching from a CSV file. So it is basically a string. and I cannot use XML parser here. — user2606665, Jan 19 '16 at 07:25
Why not use the XML parser of the post body data of your request? — Tim Pietzcker, Jan 19 '16 at 07:40

score 3 · Answer 1 · answered Jan 19 '16 at 07:10

3

Since your data is simply XML you could just use an XML parser, documentation can be found here

answered Jan 19 '16 at 07:10

Pascal Roessner

111
6

Just added a note in my question which makes it more clear. Not able to use xml parser here. – user2606665 Jan 19 '16 at 07:29
You can totally parse XML from a string, at least according to the documentation. That you do not know what the names are is fine, since you can access children recursively and access them using .tag and .attrib – Pascal Roessner Jan 19 '16 at 08:04
Can you refer me to some example? I tried doing that with .tag and .attrib but wasn't successful. – user2606665 Jan 19 '16 at 09:30
I also want to add that parsing xml with regex is not really possible. (As has been stated many times on SO), here is a rather funny but good read: [click](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Pascal Roessner Jan 19 '16 at 11:07

score 0 · Answer 2 · answered Jan 19 '16 at 07:19

0

Use an XML parser; xml.dom.minidom should suffice:

>>> import xml.dom.minidom
>>> doc = xml.dom.minidom.parseString(xmlString)
>>> for item in doc.getElementsByTagName("Attribute"):
...     print(item.getAttribute("name"))
...     print(item.childNodes[0].wholeText)
Atri1
A1
Atri2
A2
Atri3
A3

answered Jan 19 '16 at 07:19

Tim Pietzcker

328,213
58
503
561

Hi Tim, I thought of using it but the problem is this is just one set of string. I have multiple set of strings which have different values. Using parser with hard-coded tag values will only work for one string and not for others. – user2606665 Jan 19 '16 at 07:22
@user2606665: You only need to know the names if you're looking for specific nodes. You can just as easily parse the whole tree recursively, something you can't do with regexes. – Tim Pietzcker Jan 19 '16 at 07:26
Actually nodes have different names in different strings. They aren't the same for all strings. – user2606665 Jan 19 '16 at 07:29
Yes, which makes the case for an XML parser even more. You don't need to know either the names of the nodes or the structure beforehand. – Tim Pietzcker Jan 19 '16 at 07:39
Alright. I now understand what you're trying to say. I'll try to use xml parser and let you know. – user2606665 Jan 19 '16 at 07:57
Hey, I went through python docs but was not able to understand it clearly. Can you point me to some example or article link if you know any? – user2606665 Jan 19 '16 at 09:22
I tried using your code mentioned above. It doesn't print anything for me. Is it because I am passing it as string? – user2606665 Jan 19 '16 at 09:28
The code works if `xmlString` contains the sample data you posted in your question. – Tim Pietzcker Jan 19 '16 at 09:35
Sorry. My mistake. I did debug and found I was reading another string. Yes, the code works now. But I am still not able to implement xml parsing of string :( – user2606665 Jan 19 '16 at 09:50

Extracting multiple substrings from a string using python

2 Answers2