In the FEED_URI setting, you can add placeholder that will be replaced with scraped data.
For i.e. domain name can be incluede in the file name by using the domain attribute like this
FEED_URI = 's3://my-bucket/{domain}/%(time)s.json'
This solution would only work if the spider were run one time per domain, but since you haven't explicitly said so, I would assume a single run crawls multiple domains.
If you know all domains beforehand, you can generate the value of the FEEDS setting programmatically and use item filtering.
# Assumes that items have a domain field and that all target domains are
# defined in an ALL_DOMAINS variable.
class DomainFilter:
def __init__(self, feed_options):
self.domain = feed_options["domain"]
def accepts(self, item):
return item["domain"] == self.domain
ALL_DOMAINS = ["toscrape.com", ...]
FEEDS = {
f"s3://mybucket/{domain}.jsonl": {
"format": "jsonlines",
"item_filter": DomainFilter,
}
for domain in ALL_DOMAINS
}