-1

I am currently working on this code that creates a summary from an XML input file. However, the summary picks up irrelevant information that is towards the end of the file. For instance, I want to read all of the lines of the XML file except when it reaches the following string header, "NOTICE TO APPELLANT". I would like to ignore all the lines that come after that string header.

Also, I am reading the file in binary mode because the XML file is not well formed. So is there a way I can do this while reading the file in binary mode.

user9608799
  • 31
  • 1
  • 8
  • this might help you https://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python – Peko Chan Apr 18 '18 at 16:03
  • Hi. I am not sure if it really assists me with what I am looking for. I would like to ignore lines after a certain string, "NOTICE TO APPELLANT" while reading the file in binary mode – user9608799 Apr 18 '18 at 16:16
  • from the link I've sent you can take only the strings needed. but if the file is 'broken', I nice way would be to fix it, if it's not to hard, it will be eaiser to work with, or the answer from @Divyang-Vashi seems to be a nice one. – Peko Chan Apr 18 '18 at 20:35
  • It does not seem to be necessary to use "binary mode" at all. I guess you believe it's that *or* go all the way and use a full XML reader, and you fail to realise there is a third, more logical option: open and read as a plain text file. – Jongware Apr 18 '18 at 21:37
  • @PekoChan: Thanks – user9608799 Apr 19 '18 at 00:30

1 Answers1

0

This is what I have understood from your question: You would like to stop reading a file that you are reading in binary mode once you encounter the sub-string "NOTICE TO APPELLANT". I am confused whether you want to read the line that contains the sub-string after the sub-string. But I assume you don't want to read the lines after the line that contains this sub-string.

   with open("test_xml.xml", "rb") as f:
   ...:     for line in f:
   ...:         if b'NOTICE TO APPELLANT' in line:
   ...:             print(line) #you can replace this function call
   ...:             break
   ...:         else:
   ...:             print(line)

where my input file "test_xml.xml" looked like this...

<note>
    <to>Tove</to>
    <from>Jani</from>
        <heading>Reminder</heading>
        <body>Don't forget me this weekend!</body>
        <sometag>NOTICE TO APPELLANT</sometag>

    sd;kfposdkjfpksdf sdk sd
            ALL THESE WONT BE SCANNED/READ
            SDFKSDPFJSDHF
        OHSFOHSD
</note>

Most of the code is simple to understand except the part where I convert str to binary type but that is still not that difficult.