1

I am importing the Scrapy item keys from items.py, into pipelines.py. The problem is that the order of the imported items are different from how they were defined in the items.py file.

My items.py file:

class NewAdsItem(Item):
    AdId        = Field()
    DateR       = Field()
    AdURL       = Field()

In my pipelines.py:

from adbot.items import NewAdsItem
...
def open_spider(self, spider):
     self.ikeys = NewAdsItem.fields.keys()
     print("Keys in pipelines: \t%s" % ",".join(self.ikeys) )
     #self.createDbTable(ikeys)

The output is:

Keys in pipelines:  AdId,AdURL,DateR

instead of the expected: AdId,DateR,AdURL.

How can I ensure that the imported order remains the same?

Note: This might be related to How to get order of fields in Scrapy item, but it's not at all very clear what's going on, since Python3 docs state that lists and dicts should retain their order. Also note, that when using process_item() and using item.keys(), the order is retained! But I need to access the keys in order before item's are scraped.

not2qubit
  • 14,531
  • 8
  • 95
  • 135

2 Answers2

2

The only way I could get this to work, was to use this solution in the following manner.

My items.py file:

from scrapy.item import Item, Field
from collections import OrderedDict
from types import FunctionType

class StaticOrderHelper(type):
    # Requires Python3
    def __prepare__(name, bases, **kwargs):
        return OrderedDict()

    def __new__(mcls, name, bases, namespace, **kwargs):
        namespace['_field_order'] = [
                k
                for k, v in namespace.items()
                if not k.startswith('__') and not k.endswith('__')
                    and not isinstance(v, (FunctionType, classmethod, staticmethod))
        ]
        return type.__new__(mcls, name, bases, namespace, **kwargs)

class NewAdsItem(metaclass=StaticOrderHelper):
    AdId        = Field()
    DateR       = Field()
    AdURL       = Field()

Then import the _field_order item into your piplines.py with:

...
from adbot.items import NewAdsItem
...
class DbPipeline(object):
    ikeys = NewAdsItem._field_order
    ...
    def createDbTable(self):
        print("Creating new table: %s" % self.dbtable )
        print("Keys in creatDbTable: \t%s" % ",".join(self.ikeys) )
        ...

I can now create new DB tables in the correct order of appearance, without worrying of Python's weird way of sorting dicts in unexpected ways.

not2qubit
  • 14,531
  • 8
  • 95
  • 135
-1

A simple fix is to define keys() method in your Item class:

class MyItem(Item):
    foo = Field()
    bar = Field()
    gar = Field()
    cha = Field()

    def keys(self):
        # in your preferred order
        return ['cha', 'gar','bar','foo']
Granitosaurus
  • 20,530
  • 5
  • 57
  • 82
  • This doesn't work. I still get the alphabetical order `AdId,AdURL,DateR`.Are you using Python2? (I am using Python3). – not2qubit Oct 09 '18 at 06:55
  • Hmm I've only used this ages ago, maybe newer versions of scrapy no longer support keys overriding anymore, gonna test it. – Granitosaurus Oct 09 '18 at 07:14
  • You can test it from the scrapy shell, by importing the items.py. Has no effect at all. at least for me. – not2qubit Oct 09 '18 at 07:45