Using selenium with scrapy

Question

I am trying to scrape all recorded event's table from the web-site http://southasiaterrorism.trfetzer.com/districts/17497-IND-Nandurbar.html. I am using scrapy spider for it, but it's not possible to get that table as it's loaded dynamically. I was trying to use selenium, but no result, I got the same static html page without the table loaded. Any help would be greatly appreciated.

No, its not loaded dynamically, just check the page source inside `script` tag there is a list of all those table elements, just extract that. No need of selenium for this — Stack, Oct 25 '17 at 17:55
but I don't see why I earn negative sign, maybe for someone it's simple, but I am newbie in all this things. — Sirak Ghazaryan, Oct 25 '17 at 19:00

score 0 · Accepted Answer · answered Oct 25 '17 at 18:02

0

As mentioned by @Stack, the content is not loaded dynamically, it's in the page inside the <script> tags. You can try something like this:

page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
for tr in soup.find_all('tr')[2:]:
    tds = tr.find_all('td')
    print (tds)

From this question.

Note: this code is untested.

answered Oct 25 '17 at 18:02

jdoe

634
5
19

1

thanks, indeed the solution was in using BeautifulSoup, but I also use regexp to fetch needed data. – Sirak Ghazaryan Oct 25 '17 at 19:01
I suggest to use python requests instead of urllib2 – PHA Oct 26 '17 at 16:50

Using selenium with scrapy

1 Answers1