0

I'm trying to append string to indexes array inside parse function but when i try to save it to .json became empty.

import scrapy
import json

class NewsBrief(scrapy.Spider):
    name = "briefs"
    indexes = []
    def start_requests(self):
        ids = []
        url = "url"

        with open('test_id.json') as json_data:
            ids = json.load(json_data)

        for i in ids:
            yield scrapy.http.FormRequest(url=url+str(i), callback=self.parse)

        #self index is empty here
        print(self.indexes)

        with open('data_briefs.json', 'w') as outfile:
            json.dump(self.indexes, outfile)

    def parse(self, response):
        sentence = ""
        for span in enumerate(response.xpath('//div[@class="newsread olnr"]/p/text()').getall()):
            sentence += str(span[1]).replace('\n', ' ').replace('\r', ' ')
        self.indexes.append(sentence)
billy
  • 13
  • 3

1 Answers1

1

Variable self.indexes will not be filled after cycle with Requests. The requests even not being done there.

If you don't want to use common export to file, you can put writing to file to function on spider close. Check details here: scrapy: Call a function when a spider quits

You need to bind signal to function and write code there.

vezunchik
  • 3,669
  • 3
  • 16
  • 25