I've seen people using Airflow to schedule hundreds of scraping jobs through Scrapyd daemons. However, one thing they miss in Airflow is monitoring long-lasting jobs like scraping: getting number of pages and items scraped so far, number of URL that failed so far or were retried without success.
What are my options to monitor current status of long lasting jobs? Is there something already available or I need to resort to external solutions like Prometheus, Grafana and instrument Scrapy spiders myself?