1

Hi everyone i have this problem, i want to scrape multiple url in the same domain and save the results on a json file, but the output only return n-times the results from the last url.

Maybe a example will help me to explain.

UPDATE WITH THE REAL CODE This is my code:

import scrapy

class Test(scrapy.Spider):
    name= "testscraper"
    allowed_domains=['ebird.org']
    start_urls=[
            'https://ebird.org/species/ostric2',
            'https://ebird.org/species/ostric3',
            'https://ebird.org/species/y00934', 
            'https://ebird.org/species/grerhe1', 
            'https://ebird.org/species/lesrhe2'
    ]
    def start_requests(self):
        for url in self.start_urls:
            print('---------------------------')
            print(url)
            print('---------------------------')
            yield scrapy.Request(url=url,callback=self.parse,dont_filter=True)
 

    def parse(self,response):
        print('***************************')
        print(response.url)
        print('***************************')
        image = response.css('img').xpath('@src').get()
        code = response.url[-7::]
        common_name=response.xpath('//span[@class="Heading-main Media--hero-title"]//text()').get()
        scientific_name=response.xpath('//span[@class="Heading-sub Heading-sub--sci Heading-sub--custom u-text-4-loose"]//text()').get()
        description=response.xpath('//p[@class="u-stack-sm"]/text()').get()
        if description:
            description=description.split('\n',1)[0]

        yield {
            'code':code,
            'scientific_name':scientific_name,
            'common_name':common_name,
            'description':description,
            'image':image,
            'url':response.url
        }

So when i run :

scrapy crawl testscraper -O testscraper.json

The file testscraper.json have:

[
{"code": "lesrhe2", "scientific_name": "Rhea pennata", "common_name": "Lesser Rhea", "description": "This flightless South American relative of the Ostrich stands about 5 feet tall with a body about the size of a sheep; no similar species in its range. Rheas roam widely on open Patagonian steppe and also occur locally in open habitats of the Andes, mainly at very high elevations. Can be confiding where used to people, but in other areas wary, running strongly and quickly. Rheas occur singly or in groups, and males take care of the young. Adults have bold pale spots on the body, first-year birds are plainer overall.", "image": "https://cdn.download.ams.birds.cornell.edu/api/v1/asset/115691341/1800", "audio": "assetId\":524686", "url": "https://ebird.org/species/lesrhe2"},
{"code": "lesrhe2", "scientific_name": "Rhea pennata", "common_name": "Lesser Rhea", "description": "This flightless South American relative of the Ostrich stands about 5 feet tall with a body about the size of a sheep; no similar species in its range. Rheas roam widely on open Patagonian steppe and also occur locally in open habitats of the Andes, mainly at very high elevations. Can be confiding where used to people, but in other areas wary, running strongly and quickly. Rheas occur singly or in groups, and males take care of the young. Adults have bold pale spots on the body, first-year birds are plainer overall.", "image": "https://cdn.download.ams.birds.cornell.edu/api/v1/asset/115691341/1800", "audio": "assetId\":524686", "url": "https://ebird.org/species/lesrhe2"},
{"code": "lesrhe2", "scientific_name": "Rhea pennata", "common_name": "Lesser Rhea", "description": "This flightless South American relative of the Ostrich stands about 5 feet tall with a body about the size of a sheep; no similar species in its range. Rheas roam widely on open Patagonian steppe and also occur locally in open habitats of the Andes, mainly at very high elevations. Can be confiding where used to people, but in other areas wary, running strongly and quickly. Rheas occur singly or in groups, and males take care of the young. Adults have bold pale spots on the body, first-year birds are plainer overall.", "image": "https://cdn.download.ams.birds.cornell.edu/api/v1/asset/115691341/1800", "audio": "assetId\":524686", "url": "https://ebird.org/species/lesrhe2"},
{"code": "lesrhe2", "scientific_name": "Rhea pennata", "common_name": "Lesser Rhea", "description": "This flightless South American relative of the Ostrich stands about 5 feet tall with a body about the size of a sheep; no similar species in its range. Rheas roam widely on open Patagonian steppe and also occur locally in open habitats of the Andes, mainly at very high elevations. Can be confiding where used to people, but in other areas wary, running strongly and quickly. Rheas occur singly or in groups, and males take care of the young. Adults have bold pale spots on the body, first-year birds are plainer overall.", "image": "https://cdn.download.ams.birds.cornell.edu/api/v1/asset/115691341/1800", "audio": "assetId\":524686", "url": "https://ebird.org/species/lesrhe2"},
{"code": "lesrhe2", "scientific_name": "Rhea pennata", "common_name": "Lesser Rhea", "description": "This flightless South American relative of the Ostrich stands about 5 feet tall with a body about the size of a sheep; no similar species in its range. Rheas roam widely on open Patagonian steppe and also occur locally in open habitats of the Andes, mainly at very high elevations. Can be confiding where used to people, but in other areas wary, running strongly and quickly. Rheas occur singly or in groups, and males take care of the young. Adults have bold pale spots on the body, first-year birds are plainer overall.", "image": "https://cdn.download.ams.birds.cornell.edu/api/v1/asset/115691341/1800", "audio": "assetId\":524686", "url": "https://ebird.org/species/lesrhe2"}
]

The last dict but 5 times, one for every url.

I was searching for help and someboy recomend to have this settings:

DUPEFILTER_CLASS = 'scrapy.dupefilters.BaseDupeFilter'

But still not works.

I usually dont ask for help but i dont really understand what is happening. Maybe is a silly thing but i dont really see it. Please give me a hint if you know what is happening.

Actual settings

DOWNLOAD_DELAY = 30
DUPEFILTER_CLASS = 'scrapy.dupefilters.BaseDupeFilter'
CONCURRENT_REQUESTS_PER_DOMAIN=1

And the logs:

2021-08-17 00:32:16 [scrapy.utils.log] INFO: Scrapy 2.5.0 started (bot: ebird)
2021-08-17 00:32:16 [scrapy.utils.log] INFO: Versions: lxml 4.6.3.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.8.5 (default, Sep  4 2020, 07:30:14) - [GCC 7.3.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k  25 Mar 2021), cryptography 3.4.7, Platform Linux-5.4.0-80-generic-x86_64-with-glibc2.10
2021-08-17 00:32:16 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2021-08-17 00:32:16 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'ebird',
 'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
 'DOWNLOAD_DELAY': 10,
 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
 'NEWSPIDER_MODULE': 'ebird.spiders',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['ebird.spiders']}
2021-08-17 00:32:16 [scrapy.extensions.telnet] INFO: Telnet Password: 00b83b5e4e0bedd7
2021-08-17 00:32:16 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.logstats.LogStats']
2021-08-17 00:32:16 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-08-17 00:32:16 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2021-08-17 00:32:16 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2021-08-17 00:32:16 [scrapy.core.engine] INFO: Spider opened
2021-08-17 00:32:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-08-17 00:32:16 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
---------------------------
https://ebird.org/species/ostric2
---------------------------
---------------------------
https://ebird.org/species/ostric3
---------------------------
---------------------------
https://ebird.org/species/y00934
---------------------------
---------------------------
https://ebird.org/species/grerhe1
---------------------------
---------------------------
https://ebird.org/species/lesrhe2
---------------------------
2021-08-17 00:32:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ebird.org/robots.txt> (referer: None)
2021-08-17 00:32:29 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://secure.birds.cornell.edu/cassso/login?service=https%3A%2F%2Febird.org%2Flogin%2Fcas%3Fportal%3Debird&gateway=true&locale=en> from <GET https://ebird.org/species/ostric2>
2021-08-17 00:32:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://secure.birds.cornell.edu/robots.txt> (referer: None)
2021-08-17 00:32:40 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://secure.birds.cornell.edu/cassso/login?service=https%3A%2F%2Febird.org%2Flogin%2Fcas%3Fportal%3Debird&gateway=true&locale=en> from <GET https://ebird.org/species/ostric3>
2021-08-17 00:32:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://secure.birds.cornell.edu/cassso/login?service=https%3A%2F%2Febird.org%2Flogin%2Fcas%3Fportal%3Debird&gateway=true&locale=en> from <GET https://ebird.org/species/y00934>
2021-08-17 00:33:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://secure.birds.cornell.edu/cassso/login?service=https%3A%2F%2Febird.org%2Flogin%2Fcas%3Fportal%3Debird&gateway=true&locale=en> from <GET https://ebird.org/species/grerhe1>
2021-08-17 00:33:15 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://secure.birds.cornell.edu/cassso/login?service=https%3A%2F%2Febird.org%2Flogin%2Fcas%3Fportal%3Debird&gateway=true&locale=en> from <GET https://ebird.org/species/lesrhe2>
2021-08-17 00:33:16 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 2 pages/min), scraped 0 items (at 0 items/min)
2021-08-17 00:33:28 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/login/cas?portal=ebird> from <GET https://secure.birds.cornell.edu/cassso/login?service=https%3A%2F%2Febird.org%2Flogin%2Fcas%3Fportal%3Debird&gateway=true&locale=en>
2021-08-17 00:33:41 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/login/cas?portal=ebird> from <GET https://secure.birds.cornell.edu/cassso/login?service=https%3A%2F%2Febird.org%2Flogin%2Fcas%3Fportal%3Debird&gateway=true&locale=en>
2021-08-17 00:33:54 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/login/cas?portal=ebird> from <GET https://secure.birds.cornell.edu/cassso/login?service=https%3A%2F%2Febird.org%2Flogin%2Fcas%3Fportal%3Debird&gateway=true&locale=en>
2021-08-17 00:34:05 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/login/cas?portal=ebird> from <GET https://secure.birds.cornell.edu/cassso/login?service=https%3A%2F%2Febird.org%2Flogin%2Fcas%3Fportal%3Debird&gateway=true&locale=en>
2021-08-17 00:34:16 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-08-17 00:34:19 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/login/cas?portal=ebird> from <GET https://secure.birds.cornell.edu/cassso/login?service=https%3A%2F%2Febird.org%2Flogin%2Fcas%3Fportal%3Debird&gateway=true&locale=en>
2021-08-17 00:34:32 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/ebird/species/lesrhe2> from <GET https://ebird.org/login/cas?portal=ebird>
2021-08-17 00:34:42 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/ebird/species/lesrhe2> from <GET https://ebird.org/login/cas?portal=ebird>
2021-08-17 00:34:53 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/ebird/species/lesrhe2> from <GET https://ebird.org/login/cas?portal=ebird>
2021-08-17 00:35:14 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/ebird/species/lesrhe2> from <GET https://ebird.org/login/cas?portal=ebird>
2021-08-17 00:35:15 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/ebird/species/lesrhe2> from <GET https://ebird.org/login/cas?portal=ebird>
2021-08-17 00:35:16 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-08-17 00:35:21 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/species/lesrhe2> from <GET https://ebird.org/ebird/species/lesrhe2>
2021-08-17 00:35:34 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/species/lesrhe2> from <GET https://ebird.org/ebird/species/lesrhe2>
2021-08-17 00:35:51 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/species/lesrhe2> from <GET https://ebird.org/ebird/species/lesrhe2>
2021-08-17 00:36:03 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/species/lesrhe2> from <GET https://ebird.org/ebird/species/lesrhe2>
2021-08-17 00:36:16 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://ebird.org/species/lesrhe2> from <GET https://ebird.org/ebird/species/lesrhe2>
2021-08-17 00:36:16 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-08-17 00:36:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ebird.org/species/lesrhe2> (referer: None)
***************************
https://ebird.org/species/lesrhe2
***************************
2021-08-17 00:36:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ebird.org/species/lesrhe2>
{'code': 'lesrhe2', 'scientific_name': 'Rhea pennata', 'common_name': 'Lesser Rhea', 'description': 'This flightless South American relative of the Ostrich stands about 5 feet tall with a body about the size of a sheep; no similar species in its range. Rheas roam widely on open Patagonian steppe and also occur locally in open habitats of the Andes, mainly at very high elevations. Can be confiding where used to people, but in other areas wary, running strongly and quickly. Rheas occur singly or in groups, and males take care of the young. Adults have bold pale spots on the body, first-year birds are plainer overall.', 'image': 'https://cdn.download.ams.birds.cornell.edu/api/v1/asset/115691341/1800', 'audio': 'assetId":524686', 'url': 'https://ebird.org/species/lesrhe2'}
2021-08-17 00:36:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ebird.org/species/lesrhe2> (referer: None)
***************************
https://ebird.org/species/lesrhe2
***************************
2021-08-17 00:36:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ebird.org/species/lesrhe2>
{'code': 'lesrhe2', 'scientific_name': 'Rhea pennata', 'common_name': 'Lesser Rhea', 'description': 'This flightless South American relative of the Ostrich stands about 5 feet tall with a body about the size of a sheep; no similar species in its range. Rheas roam widely on open Patagonian steppe and also occur locally in open habitats of the Andes, mainly at very high elevations. Can be confiding where used to people, but in other areas wary, running strongly and quickly. Rheas occur singly or in groups, and males take care of the young. Adults have bold pale spots on the body, first-year birds are plainer overall.', 'image': 'https://cdn.download.ams.birds.cornell.edu/api/v1/asset/115691341/1800', 'audio': 'assetId":524686', 'url': 'https://ebird.org/species/lesrhe2'}
2021-08-17 00:36:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ebird.org/species/lesrhe2> (referer: None)
***************************
https://ebird.org/species/lesrhe2
***************************
2021-08-17 00:36:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ebird.org/species/lesrhe2>
{'code': 'lesrhe2', 'scientific_name': 'Rhea pennata', 'common_name': 'Lesser Rhea', 'description': 'This flightless South American relative of the Ostrich stands about 5 feet tall with a body about the size of a sheep; no similar species in its range. Rheas roam widely on open Patagonian steppe and also occur locally in open habitats of the Andes, mainly at very high elevations. Can be confiding where used to people, but in other areas wary, running strongly and quickly. Rheas occur singly or in groups, and males take care of the young. Adults have bold pale spots on the body, first-year birds are plainer overall.', 'image': 'https://cdn.download.ams.birds.cornell.edu/api/v1/asset/115691341/1800', 'audio': 'assetId":524686', 'url': 'https://ebird.org/species/lesrhe2'}
2021-08-17 00:37:16 [scrapy.extensions.logstats] INFO: Crawled 5 pages (at 3 pages/min), scraped 3 items (at 3 items/min)
2021-08-17 00:37:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ebird.org/species/lesrhe2> (referer: None)
***************************
https://ebird.org/species/lesrhe2
***************************
2021-08-17 00:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ebird.org/species/lesrhe2>
{'code': 'lesrhe2', 'scientific_name': 'Rhea pennata', 'common_name': 'Lesser Rhea', 'description': 'This flightless South American relative of the Ostrich stands about 5 feet tall with a body about the size of a sheep; no similar species in its range. Rheas roam widely on open Patagonian steppe and also occur locally in open habitats of the Andes, mainly at very high elevations. Can be confiding where used to people, but in other areas wary, running strongly and quickly. Rheas occur singly or in groups, and males take care of the young. Adults have bold pale spots on the body, first-year birds are plainer overall.', 'image': 'https://cdn.download.ams.birds.cornell.edu/api/v1/asset/115691341/1800', 'audio': 'assetId":524686', 'url': 'https://ebird.org/species/lesrhe2'}
2021-08-17 00:37:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ebird.org/species/lesrhe2> (referer: None)
***************************
https://ebird.org/species/lesrhe2
***************************
2021-08-17 00:37:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://ebird.org/species/lesrhe2>
{'code': 'lesrhe2', 'scientific_name': 'Rhea pennata', 'common_name': 'Lesser Rhea', 'description': 'This flightless South American relative of the Ostrich stands about 5 feet tall with a body about the size of a sheep; no similar species in its range. Rheas roam widely on open Patagonian steppe and also occur locally in open habitats of the Andes, mainly at very high elevations. Can be confiding where used to people, but in other areas wary, running strongly and quickly. Rheas occur singly or in groups, and males take care of the young. Adults have bold pale spots on the body, first-year birds are plainer overall.', 'image': 'https://cdn.download.ams.birds.cornell.edu/api/v1/asset/115691341/1800', 'audio': 'assetId":524686', 'url': 'https://ebird.org/species/lesrhe2'}
2021-08-17 00:37:32 [scrapy.core.engine] INFO: Closing spider (finished)
2021-08-17 00:37:32 [scrapy.extensions.feedexport] INFO: Stored json feed (5 items) in: birdscraper.json
2021-08-17 00:37:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 7672,
 'downloader/request_count': 27,
 'downloader/request_method_count/GET': 27,
 'downloader/response_bytes': 373670,
 'downloader/response_count': 27,
 'downloader/response_status_count/200': 7,
 'downloader/response_status_count/302': 20,
 'elapsed_time_seconds': 315.86914,
 'feedexport/success_count/FileFeedStorage': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2021, 8, 17, 4, 37, 32, 599873),
 'httpcompression/response_bytes': 1367567,
 'httpcompression/response_count': 6,
 'item_scraped_count': 5,
 'log_count/DEBUG': 32,
 'log_count/INFO': 16,
 'memusage/max': 66494464,
 'memusage/startup': 56147968,
 'response_received_count': 7,
 'robotstxt/request_count': 2,
 'robotstxt/response_count': 2,
 'robotstxt/response_status_count/200': 2,
 'scheduler/dequeued': 25,
 'scheduler/dequeued/memory': 25,
 'scheduler/enqueued': 25,
 'scheduler/enqueued/memory': 25,
 'start_time': datetime.datetime(2021, 8, 17, 4, 32, 16, 730733)}
2021-08-17 00:37:32 [scrapy.core.engine] INFO: Spider closed (finished)

I noticed the redirects, so i tried to change it with "dont_redirect" and after that the script prints the correct url but shows an error because the spider didnt enter to the page, so couldnt get any field.

Isaac
  • 11
  • 2
  • Because you haven't shown us your real code, we can't really help you very much. You would see this if your `yield` was returning a dict stored in a member variable that you were updated each time, instead of creating one anew. Because you have filtered your code so heavily, we can't tell. – Tim Roberts Aug 17 '21 at 04:08
  • It's exported from `.parse()`. Please improve your question by adding related source code. – Simba Aug 17 '21 at 04:22
  • Hi @TimRoberts sorry for the trouble. I updated the post with the code and add the logs, i noticed the redirects and i added a meta on the yield with "dont_redirect" and "handle_hettpstatus_list":[302] but it give me an error because the spider couldn't enter to the page. – Isaac Aug 17 '21 at 04:46

1 Answers1

1

In playing around with curl and redirects with your domain of interest (ebird.org), it became clear that cookies are needed in order for the redirects to eventually resolve correctly. However, recycling the same cookie session (the default scrapy behavior) seems to cause the strange redirect behavior you are seeing

The fix is to use a distinct cookie session for each Request:

    ...

    def start_requests(self):
        for i, url in enumerate(self.start_urls):
            print("---------------------------")
            print(url)
            print("---------------------------")
            yield scrapy.Request(
                url=url, callback=self.parse, dont_filter=True, meta={"cookiejar": i}
            )

    ...

Note that if you later add subsequent requests from each start request, you will need to explicitly re-attach the cookie-jar via meta={'cookiejar': response.meta['cookiejar']} each time

See also this answer

lemonhead
  • 5,328
  • 1
  • 13
  • 25
  • This works very well, thanks a lot! But may I ask one more question about this? because it doesn't get clear to me. If every request has its own cookie, do I still need to fix the concurrency request per domain? I mean, could I make N request with N cookies and there will be no problem with that? The question you mentioned ask something similar but nobody comment about that. I ask this because I noticed the scrapping is so slow. – Isaac Aug 17 '21 at 19:23
  • Yes, I think it should work fine with concurrency. I used the default settings when testing w/o any delays or modifications to # concurrent requests per domain – lemonhead Aug 17 '21 at 20:57