5

The default order in scrapy is alphabet,i have read some post to use OrderedDict to output item in customized order.
I write a spider follow the webpage.
How to get order of fields in Scrapy item

My items.py.

import scrapy
from collections import OrderedDict


class OrderedItem(scrapy.Item):
    def __init__(self, *args, **kwargs):
        self._values = OrderedDict()
        if args or kwargs:  
            for k, v in six.iteritems(dict(*args, **kwargs)):
                self[k] = v

class StockinfoItem(OrderedItem):
    name = scrapy.Field()
    phone = scrapy.Field()
    address = scrapy.Field()

The simple spider file.

import scrapy
from info.items import InfoItem

class InfoSpider(scrapy.Spider):
    name = 'Info'
    allowed_domains = ['quotes.money.163.com']
    start_urls = [ "http://quotes.money.163.com/f10/gszl_600023.html"]
    def parse(self, response):
        item = InfoItem()
        item["name"] = response.xpath('/html/body/div[2]/div[4]/table/tr[2]/td[2]/text()').extract()
        item["phone"] = response.xpath('/html/body/div[2]/div[4]/table/tr[7]/td[4]/text()').extract()
        item["address"] = response.xpath('/html/body/div[2]/div[4]/table/tr[2]/td[4]/text()').extract()
        item.items()
        yield  item

The scrapy info when to run the spider.

2019-04-25 13:45:01 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{'address': ['浙江省杭州市天目山路152号浙能大厦'],'name': ['浙能电力'],'phone': ['0571-87210223']}

Why i can't get such desired order as below?

{'name': ['浙能电力'],'phone': ['0571-87210223'],'address': ['浙江省杭州市天目山路152号浙能大厦']}

Thank for Gallaecio's advice, to add the following in settings.py.

FEED_EXPORT_FIELDS=['name','phone','address']

Execute the spider and output to csv file.

scrapy crawl  info -o  info.csv

The field order is in my customized order.

cat info.csv
name,phone,address
浙能电力,0571-87210223,浙江省杭州市天目山路152号浙能大

Look at the scrapy's debug info :

2019-04-26 00:16:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{'address': ['浙江省杭州市天目山路152号浙能大厦'],
 'name': ['浙能电力'],
 'phone': ['0571-87210223']}

How can i make the debug info in customized order?How to get the following debug output?

2019-04-26 00:16:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{'name': ['浙能电力'],
 'phone': ['0571-87210223'],
 'address': ['浙江省杭州市天目山路152号浙能大厦'],}
showkey
  • 482
  • 42
  • 140
  • 295
  • 3
    You can customize the field order in an actual output file. See [FEED_EXPORT_FIELDS](https://doc.scrapy.org/en/latest/topics/feed-exports.html#std:setting-FEED_EXPORT_FIELDS). – Gallaecio Apr 25 '19 at 14:14

4 Answers4

3

Problem is in __repr__ function of Item. Originally its code is:

def __repr__(self):
    return pformat(dict(self))

So even if you convert your item to OrderedDict and expect fields to be saved in the same order, this function applies dict() to it and breaks the order.

So, I propose you to overload it in the way you like, for example:

import json

class OrderedItem(scrapy.Item):
    def __init__(self, *args, **kwargs):
        self._values = OrderedDict()
        if args or kwargs:
            for k, v in six.iteritems(dict(*args, **kwargs)):
                self[k] = v

    def __repr__(self):
        return json.dumps(OrderedDict(self), ensure_ascii = False)  # it should return some string

And now you can get this output:

2019-04-30 18:56:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{"name": ["\u6d59\u80fd\u7535\u529b"], "phone": ["0571-87210223"], "address": ["\u6d59\u6c5f\u7701\u676d\u5dde\u5e02\u5929\u76ee\u5c71\u8def152\u53f7\u6d59\u80fd\u5927\u53a6"]}
vezunchik
  • 3,669
  • 3
  • 16
  • 25
1

you can define a custom string representation of your item

class InfoItem:
    def __repr__(self):
      return 'name: {}, phone: {}, address: {}'.format(self['name'], self.['phone'], self.['address'])
Raphael
  • 1,731
  • 2
  • 7
  • 23
0

In your spider replace item.items() with self.log(item.items()), log msg should be list of tuples in order you assigned them in your spider.

Another way is to combine answer you mentioned in your post with this answer

0

The whole items.py which can output customized dubug info in cjk apperance is as below.

import scrapy
import json    
from collections import OrderedDict

class OrderedItem(scrapy.Item):
    def __init__(self, *args, **kwargs):
        self._values = OrderedDict()
        if args or kwargs:
            for k, v in six.iteritems(dict(*args, **kwargs)):
                self[k] = v

    def __repr__(self):
        return json.dumps(OrderedDict(self),ensure_ascii = False)  
        #ensure_ascii = False ,it make characters show in cjk appearance.

class StockinfoItem(OrderedItem):
    name = scrapy.Field()
    phone = scrapy.Field()
    address = scrapy.Field()
showkey
  • 482
  • 42
  • 140
  • 295