1

when scraping, my output order does not match with my written order within the spider / item file.

For example:

  def parse(self, response):

        complete_article = response.xpath('//div[@class="storywrapper"]')

        for article in complete_article:
            dachzeile = article.xpath('.//div[@class="meldungHead"]/h1/...
            headline = article.xpath('.//div[@class="meldungHead"]/h1/...
            date = article.xpath('//meta[@name="date"]...
            datum = date.split("T")[0]
            uhrzeit = date.split("T")[1]
            ueberschrift = article.xpath('.//div[@class="mod ....
            text = article.xpath('//div[@class="storywra...
            relative_image = article.xpath('//div[@class="media ...
            final_image = self.base_url + relative_image
            url = response.url.encode('utf-8')

            items = testItem()

            items['Dachzeile'] = dachzeile
            items['Titel'] = headline
            items['Datum'] = datum
            items['Zeit'] = uhrzeit
            items['Einleitung'] = ueberschrift
            items['Artikel'] = text
            items['Bild'] = final_image
            items['Adresse'] = url

            yield items

But the output in the json-file looks like:

[
  {
    "Artikel": "....",
    "Einleitung": "...",
    "Titel": "...",
    "Zeit": "19:43:10",
    "Datum": "2020-03-28",
    "Adresse": "....html",
    "Bild": "...,
    "Dachzeile": "...,
  }
]

How do I set the order for the output-file?

Best regards and thanks in advance!

Rhinozeros
  • 69
  • 6

1 Answers1

0

You can use OrderedDict to maintain order

from collections import OrderedDict

    for article in complete_article:
        ... your code

        items = OrderedDict()

        items['Dachzeile'] = dachzeile
        items['Titel'] = headline
        items['Datum'] = datum
        items['Zeit'] = uhrzeit
        items['Einleitung'] = ueberschrift
        items['Artikel'] = text
        items['Bild'] = final_image
        items['Adresse'] = url

        yield items
Umair Ayub
  • 19,358
  • 14
  • 72
  • 146