ScrapyJs Javascript is Not Enabled

Question

I am trying to crawl a website that includes javascript codes and content of the web site preparing with javascript codes.

Installed Scrapy and Splash.

Splash is running with this code

sudo docker run -p 8050:8050 -v /etc/splash/proxy-profiles:/etc/splash/proxy-profiles scrapinghub/splash
2015-08-21 07:21:06+0000 [-] Log opened.
2015-08-21 07:21:06.483344 [-] Splash version: 1.7
2015-08-21 07:21:06.490230 [-] Qt 4.8.1, PyQt 4.9.1, WebKit 534.34, sip 4.13.2, Twisted 15.2.1, Lua 5.2
2015-08-21 07:21:06.490505 [-] Open files limit: 524288
2015-08-21 07:21:06.490745 [-] Open files limit increased from 524288 to 1048576
2015-08-21 07:21:06.699607 [-] Xvfb is started: ['Xvfb', ':1087', '-screen', '0', '1024x768x24']
2015-08-21 07:21:06.808450 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2015-08-21 07:21:06.929580 [-] verbosity=1
2015-08-21 07:21:06.929964 [-] slots=50
2015-08-21 07:21:06.930484 [-] Web UI: enabled, Lua: enabled (sandbox: enabled), Proxy Server: enabled
2015-08-21 07:21:06.931420 [-] Site starting on 8050
2015-08-21 07:21:06.931640 [-] Starting factory <twisted.web.server.Site instance at 0x1b5b3f8>
2015-08-21 07:21:06.938232 [-] SplashProxyServerFactory starting on 8051
2015-08-21 07:21:06.938468 [-] Starting factory <splash.proxy_server.SplashProxyServerFactory instance at 0x1b5bcf8>

When I wanted to get website code render.html shows "Javascript is not enabled. Please enable JavaScript in your browser".

import scrapy

class xxxxxSpider(scrapy.Spider):
    start_urls = ["xxxxx"]
    name = "sahibinden"
    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(url, self.parse, meta={
                'splash': {
                    'endpoint': 'render.html',
                    'args': {'wait': 0.5, 'proxy':'xxxxx'}
                }
            })

    def parse(self, response):
        with open("result.txt", "a") as myfile:
            myfile.write(str(response.css('body').extract()))

All settings are OK.

DOWNLOADER_MIDDLEWARES = {
    'scrapyjs.SplashMiddleware': 725,
}

SPLASH_URL = 'http://localhost:8050/'

DUPEFILTER_CLASS = 'scrapyjs.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapyjs.SplashAwareFSCacheStorage'

I scrapped the web site successfully once. Then I am getting "Javascript is not enabled in your browser" error.

If it helps to solve problem, this is splash output when I render the page.

2015-08-21 08:06:09.838076 [-] "172.17.42.1" - - [21/Aug/2015:08:06:09
+0000] "POST /render.html HTTP/1.1" 200 4048 "-" "Scrapy/1.0.3.post1+g83a06ed (+http://scrapy.org)"

I couldn't understand what is the problem. Any help?

Further Information

I have deleted the virtual machine. IP address is changed then I tried again. It get the results successfully for the first time. But, it couldn't get anything for second request. I think the web site is blocking my ip address.

I also change bot to use random user_agents. Still no change. — AnovaConsultancy, Aug 21 '15 at 12:57
Have you tried adjusting **CONCURRENT_REQUESTS** and **DOWNLOAD_DELAY** parameters? — gerosalesc, Aug 21 '15 at 18:52
I was using "Scrapy/1.0.3.post1+g83a06ed" then I changed it to "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36" — AnovaConsultancy, Aug 21 '15 at 23:22
have you tried with another website? just to check if your environment is correctly configured — gerosalesc, Aug 24 '15 at 15:42
@German I think you did not read carefully. For the first time it gets the website. — AnovaConsultancy, Aug 25 '15 at 10:37
I did friend, but there is still a chance this may be environment related — gerosalesc, Aug 25 '15 at 20:49
Do you know how the page is checking whether or not js is enabled? this may give you a clue — gerosalesc, Aug 25 '15 at 20:57
I think solved the problem. When I use scrapy target website understnd that is a bot and it is adding my ip to blacklist and then second request does not return any response. Now I have created an application with .NET. So it working perfectly. Thanks for your time. — AnovaConsultancy, Aug 26 '15 at 11:33
Could you provide an answer with the solution you chose and why? — Gallaecio, Jan 31 '19 at 15:16

ScrapyJs Javascript is Not Enabled

0 Answers0