0

In Scrapy, with lxml, I created a custom pipeline to generate xml according to my needs. The xml is being generated, but there is a bug: the next group (items) in the list overwrites the previous one. That is, regardless of the len() of the list, only a group (items) is saved. The code is below. Someone help me?

Even referring to the same theme, it is not a duplicate of this question: P.S.: How do you append to a file in Python? For there are quirks like preserving xml headers and footers.

# -*- coding: utf-8 -*-
from yt.items import Lista
import lxml.etree
import lxml.builder

class ytXmlPipeline(object):

    def process_item(self, item, spider):
        E = lxml.builder.ElementMaker()
        ITEMS = E.items
        CHANNEL = E.channel
        TITLE = E.title
        LOGO= E.logo_30x30
        SINOPSE = E.description
        STREAM = E.stream_url
        lista = ITEMS( 
                CHANNEL(
                    TITLE('<![CDATA['+item["title"]+']]>'),
                    LOGO('<![CDATA['+item["logo_30x30"]+']]>'),
                    SINOPSE('<![CDATA[<center><img height="254" width="200" src="'+item["logo_30x30"]+'"/><p>'+item["description"]+'</p></center>]]>'),
                    STREAM('<![CDATA['+item["stream_url"]+']]>'),
                    )
                )

        # create a new XML file with the results
        mydata = lxml.etree.tostring(lista, encoding='utf-8', pretty_print=True, xml_declaration = True, method="xml") 
        mydata = mydata.replace('&lt;','<').replace('&gt;','>')
        myfile = open("ytLista.xml", "w")
        myfile.write(mydata)
  • I deleted my answer and the duplicate flag. My initial response was to the overwrite/append issue, however I believe your real goal here is better solved by a proper lxml-based solution (as opposed to hacking an XML writer across the pipeline's `start_spider()` and `close_spider()`). A better solution would involve letting lxml handle the open/parsing/closing of the file. Then you would need to use lxml's `etree` to insert nodes. This might be a starting point: https://stackoverflow.com/questions/3648689/python-lxml-append-a-existing-xml-with-new-data/3648728 – malberts Feb 20 '19 at 13:38
  • @malberts. Thanks!! – Antonio Oliveira Feb 20 '19 at 14:35

0 Answers0