Not sure if this has come too late, but i'll write my two cents on this issue nonetheless.
I have to crawl and scrape quite a few websites that are built with a mix of React / javascript / html techs.
Correct me if I'm wrong, but I believe what you meant by this is that some webpages of that site, contains data of interest (data to be scraped) which are already loaded by HTML, without involving JS. Hence, you wish to differentiate those webpages which you need to use JS rendering apart from those that don't, to improve scraping efficiency.
To answer your question directly, there is no smart system that a crawler can use to differentiate between those two types of webpages without rendering it at least once.
If the URLs of the webpage follow a pattern that enables you to easily discern between pages that use JS and pages that only require HTML crawling:
You can try to render the page at least once and write conditional code around the response. What I mean, is to first crawl the target URL with Scrapy (HTML rendering), and if the response received is incomplete (assuming the invalid response is not due to erroneous element selection code), try crawling it a second time with a JS renderer.
This brings me to my second point. If the webpages have not fixed rendering URL pattern, you can simply try using a faster and more lightweight JS renderer.
Selenium indeed has relatively high overhead when mass crawling (up to 0.5M in your case), since it is not built for it in the first place. You can check out Pyppeteer, an unofficial port of Google's Node.js library Puppeteer in Python. This will allow you to easily integrate it with Scrapy.
Here you can read up on the Pros and Cons of the Puppeteer system VS Selenium, to better calibrate it to your use case. One major limitation is that Puppeteer just supports Chrome for now.