0

I have an xml, and I validate if it is really a good formatted xml like this:

try:
            self.doc=etree.parse(attributesXMLFilePath)
        except IOError:
            error_message = "Error: Couldn't find attribute XML file path {0}".format(attributesXMLFilePath)
            raise XMLFileNotFoundException(error_message)
        except XMLSyntaxError:
            error_message = "The file {0} is not a good XML file, recheck please".format(attributesXMLFilePath)
            raise NotGoodXMLFormatException(error_message)

as you see, I am catching the XMLSyntaxError, which is an error from :

from lxml.etree import XMLSyntaxError

that works good, but that just told me if the file is not a good xml format. However, i want to ask you guys if there is a way to know which tag is wrong because in my situation when i do this:

<name>Marco</name1>

i got the error, is there a way to know that name tag hasn't been closed yet?

Update

after some people give me the idea of line and position,i came up with this code:

    class XMLFileNotFoundException(GeneralSpiderException):
        def __init__(self, message):
            super(XMLFileNotFoundException, self).__init__(message, self)

class GeneralSpiderException(Exception):
    def __init__(self, message, e):
        super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))

and i am still raising the error like this

raise XMLFileNotFoundException(error_message)

i got this error now

    super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))
exceptions.AttributeError: 'XMLFileNotFoundException' object has no attribute 'lineno'
Marco Dinatsoli
  • 10,322
  • 37
  • 139
  • 253

2 Answers2

2

You can print the details of the error. For instance:

try:
    self.doc = etree.parse(attributesXMLFilePath)
except XMLSyntaxError as e:
    error_message = "The file {0} is not correct XML, {1}".format(attributesXMLFilePath, e.msg)
    raise NotGoodXMLFormatException(error_message)
Paulo Almeida
  • 7,803
  • 28
  • 36
  • this sounds a good option but i have (as you see) a custom exception, and i tried it do self.lineno inside my custom excpeiont, but got error that lineno is not identified, – Marco Dinatsoli Aug 28 '15 at 13:30
  • @MarcoDinatsoli I edited my answer to pass the variable to the custom exception – Paulo Almeida Aug 28 '15 at 13:40
  • so you are telling me that i have to pass the number of line and the position every time i raise an excpetion right? – Marco Dinatsoli Aug 28 '15 at 13:42
  • I don't know all that much about custom exceptions to tell you for sure. Initially I thought all you wanted was to get the details of that particular error. Incidentally, your edit is all about `XMLFileNotFoundException`, but I think it should be `NotGoodXMLFormatException`. – Paulo Almeida Aug 28 '15 at 13:46
  • the two exceptions are the same, i am asking about the idea, the general idea, rather than a specific exception – Marco Dinatsoli Aug 28 '15 at 13:48
2

This might not be exactly what you want, but you can get the exact line and column where the error was detected from the exception:

import lxml.etree
import StringIO
xml_fragment = "<name>Marco</name1>"
#               12345678901234
try:
    lxml.etree.parse(StringIO.StringIO(xml_fragment))
except lxml.etree.XMLSyntaxError as exc:
    line, column = exc.position

In this example, line and column will be 1 and 14, which indicates the first character of the closing tag that doesn't have a matching opening tag.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • this sounds a good option but i have (as you see) a custom exception, and i tried it do self.lineno inside my custom excpeiont, but got error that lineno is not identified `class XMLFileNotFoundException(GeneralSpiderException): def __init__(self, message): print self.lineno super(XMLFileNotFoundException, self).__init__(message, self)` – Marco Dinatsoli Aug 28 '15 at 13:30
  • What is `GeneralSpiderException`? Does that inherit from `XMLSyntaxError`? – chepner Aug 28 '15 at 13:32
  • that is another exception for me. you shouldn't care about it, it inherits Exception – Marco Dinatsoli Aug 28 '15 at 13:33
  • i mean the idea, i did the same print inside the other exception `NotGoodXMLFormatException ` – Marco Dinatsoli Aug 28 '15 at 13:33
  • I'm not sure what your custom exception has to do with the exception raised by `lxml.tree.parse`. Maybe you want to catch the syntax error, use that information to identify the element, and put *that* information in your custom exception? – chepner Aug 28 '15 at 13:34
  • @MarcoDinatsoli Please edit your question and add information about where exactly you used `exc.position` (or `exc.lineno`) and what is the error. – Paulo Almeida Aug 28 '15 at 13:35
  • i need to print the line number and the position, but i want to print them from my custom exception. In other words, i don't want to print them when the exception happens, i am thinking for that because (i came from java background) if i print them inside my custom exception, so i can throw many exceptions without need to re write the number of line in each one – Marco Dinatsoli Aug 28 '15 at 13:35
  • I see. But then, if I understand correctly, you have to pass `exc.position` the same way you are passing `error_message` (or included in `error_message`). – Paulo Almeida Aug 28 '15 at 13:37
  • @PauloAlmeida that is why i am trying exactly to avoid, i would like to not pass it everytime, if that is not posible please tell me – Marco Dinatsoli Aug 28 '15 at 13:40
  • If I understand your update, you need to pass the `XMLSyntaxError` as an argument when you create your custom exception; where else are you doing to get the information about the syntax error? – chepner Aug 28 '15 at 14:05