1

I'm interested in keeping reference to the order of the field names in a scrapy item. where is this stored?

>>> dir(item)
Out[7]: 
['_MutableMapping__marker',
 '__abstractmethods__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__doc__',
 '__eq__',
 '__format__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__hash__',
 '__init__',
 '__iter__',
 '__len__',
 '__metaclass__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_cache',
 '_abc_negative_cache',
 '_abc_negative_cache_version',
 '_abc_registry',
 '_class',
 '_values',
 'clear',
 'copy',
 'fields',
 'get',
 'items',
 'iteritems',
 'iterkeys',
 'itervalues',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

I tried item.keys(), but that returns an unordered dict

user1592380
  • 34,265
  • 92
  • 284
  • 515
  • I think `Scrapy` doesn't keep order of fields in item (but I'm not sure). – furas Dec 23 '16 at 01:21
  • Actually (as of today), if you're inside `process_item(self, item, spider)` (in your `pipelines.py`) then using `item.keys()` retain the order as found in the `items.py` file. But if you instead attempt to get the items *keys* by importing the class from `items.py`, the order found when using `self.ikeys = YourItem.fields.keys()` is put in *alphabetical* order. The answer below, did not resolve this issue. – not2qubit Oct 10 '18 at 08:44

1 Answers1

6

Item class has a dict interface, storing the values in the _values dict, which does not keep track of the key order (https://github.com/scrapy/scrapy/blob/1.5/scrapy/item.py#L53). I believe you could subclass from Item and override the __init__ method to make that container an Ordereddict:

from scrapy import Item
from collections import OrderedDict

class OrderedItem(Item):
    def __init__(self, *args, **kwargs):
        self._values = OrderedDict()
        if args or kwargs:  # avoid creating dict for most common case
            for k, v in six.iteritems(dict(*args, **kwargs)):
                self[k] = v

The item then preserves the order in which the values were assigned:

In [28]: class SomeItem(OrderedItem):
    ...:     a = Field()
    ...:     b = Field()
    ...:     c = Field()
    ...:     d = Field()
    ...: 
    ...: i = SomeItem()
    ...: i['b'] = 'bbb'
    ...: i['a'] = 'aaa'
    ...: i['d'] = 'ddd'
    ...: i['c'] = 'ccc'
    ...: i.items()
    ...: 
Out[28]: [('b', 'bbb'), ('a', 'aaa'), ('d', 'ddd'), ('c', 'ccc')]
elacuesta
  • 891
  • 5
  • 20
  • 1
    Thank you, that's a great answer. This is a follow up to your answer of http://stackoverflow.com/questions/41273314/database-insertion-fails-without-error-with-scrapy . I want to build a "plug and play" pipeline object that I can drop into future scrapy projects ( http://stackoverflow.com/questionI realized that the dataset package will support the insertion of ordered dicts s/41106509/sqlalchemy-dynamically-create-table-from-scrapy-item ). With the dataset package you can insert a dict, but you lose the order of the item fields. – user1592380 Dec 23 '16 at 14:42
  • I want to find a way to keep track of that order when inserting a record. I realized that the dataset package will support the insertion of ordered dicts , but I need a way to pass the keys in the correct order based on the item field order. – user1592380 Dec 23 '16 at 14:42
  • I guess you could pass the `_values` attribute of the `OrderedItem` instance, since it's an `OrderedDict`. But I'm not sure what your expected result is, I'm not aware of any databases which would "preserve the order of the fields". If you're using SQL (I think you are, you mentioned SQLite in your previous question), you can always specify the field order in your `SELECT` statements. – elacuesta Dec 23 '16 at 18:00
  • True, I export the info to csv files so it really shouldn't matter. But while developing I like to control the order of the fields within the sqllite table in my gui for easy comparison of specific fields. – user1592380 Dec 23 '16 at 18:22
  • Right, then writing a custom query specifying the order of the fields, instead of just `select * from ...` is the way to go IMHO :-) – elacuesta Dec 23 '16 at 18:53
  • Thanks again, very helpful. – user1592380 Dec 23 '16 at 20:22
  • FYI, You got me thinking and I realized that the pycharm database window which I use (https://www.jetbrains.com/help/pycharm/2016.1/database-tool-window.html ) , has drag and drop columns! That should solve my problem. – user1592380 Dec 23 '16 at 21:37
  • @elacuesta That github link is not valid, because to keep it permanent, you need it to point to a specific commit, otherwise it will change every time someone changes that file in `master` branch. – not2qubit Oct 10 '18 at 08:23
  • True, thanks for pointing that out. I updated it to point to the 1.5 tag, which was not existent at the time I originally answered the question, but that section of the code has remained unchanged. – elacuesta Oct 10 '18 at 22:49
  • this code does not work, there is an error: ' for k, v in six.iteritems(dict(*args, **kwargs)): NameError: name 'six' is not defined' – krokodilko Mar 04 '23 at 15:02