2

I have an ubuntu machine running pythong.2.7.6. When I try using lxml, which has been installed using pip, I get the following error:

Traceback (most recent call last):
  File "./export.py", line 44, in fetch_item
    root.append(elem)
  File "lxml.etree.pyx", line 742, in lxml.etree._Element.append     (src/lxml/lxml.etree.c:44339)
  File "apihelpers.pxi", line 24, in lxml.etree._assertValidNode     (src/lxml/lxml.etree.c:14127)
AssertionError: invalid Element proxy at 140443984439416

What does this mean, and how should I go about fixing this?

David542
  • 104,438
  • 178
  • 489
  • 842

1 Answers1

2

I had the same issue in multiprocessing context. It can be illustrated by the following snippet:

from multiprocessing import Pool

import lxml.html


def process(html):
    tree = lxml.html.fromstring(html)
    body = tree.find('.//body')
    print(body)
    return body


def main():
    pool = Pool()
    result = pool.apply(process, ('<html><body/></html>',))
    print(type(result))
    print(result)  


if __name__ == '__main__':
    main()

The result of running it is the following output:

<Element body at 0x7f9f690461d8>
<class 'lxml.html.HtmlElement'>
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    main()
  File "test.py", line 14, in main
    print(result)
  File "src/lxml/lxml.etree.pyx", line 1142, in lxml.etree._Element.__repr__ (src/lxml/lxml.etree.c:54748)
  File "src/lxml/lxml.etree.pyx", line 992, in lxml.etree._Element.tag.__get__ (src/lxml/lxml.etree.c:53182)
  File "src/lxml/apihelpers.pxi", line 19, in lxml.etree._assertValidNode (src/lxml/lxml.etree.c:16856)
AssertionError: invalid Element proxy at 139697870845496

Thus most obvious explanation, taking into account that __repr__ works from the worker process and the return value is available to the calling process, is deserialisation issue. It can be solved, for example, by returning lxml.html.tostring(body), or any other pickle-able object.

saaj
  • 23,253
  • 3
  • 104
  • 105
  • 1
    Yes, lxml cannot be pickled (as for now) so it cannot be transferred between processes by multiprocessing package. See: https://bugs.launchpad.net/lxml/+bug/736708 – Marcin Raczyński Oct 04 '18 at 14:59