1

I have a large XML file that looks like

<data> skdfnlsniisimsoinfsdfoisdfinsdofinodnfonf <emrosem> 23324097234097g </emrosem> 

<peto> oifmisnie </peto>

</data>

<data> sfnseosfnosefoisneofinseionfoaisenfoisen <emrosem> 3249087203470w </emrosem>

<peto> sdfn </peto>

</data>

I want to separate this into a list that looks like

 [<data> skdfnlsniisimsoinfsdfoisdfinsdofinodnfonf <emrosem> 23324097234097g </emrosem> 
 <peto> oifmisnie </peto></data>, <data> sfnseosfnosefoisneofinseionfoaisenfoisen             
 <emrosem> 3249087203470w </emrosem> <peto> sdfn </peto> </data>]

In other words, I want to split it based on the word "data".

I'm using python 2.7, thanks for the help.

2 Answers2

2

The included XML Parser is one way to parse XML. It might be a bit kludgey to get data off of it and into a list with the tags intact but it should be doable.

thegrinner
  • 11,546
  • 5
  • 41
  • 64
0

Please don't use regular expressions for this. If you need to parse XML, use an XML parser. XML just has too many subtleties to handle it with simple string manipulation routines. For a nice explanation as to why, see the first answer to this question.

Community
  • 1
  • 1
tdammers
  • 20,353
  • 1
  • 39
  • 56