I am trying to scrape all recorded event's table from the web-site http://southasiaterrorism.trfetzer.com/districts/17497-IND-Nandurbar.html. I am using scrapy spider for it, but it's not possible to get that table as it's loaded dynamically. I was trying to use selenium, but no result, I got the same static html page without the table loaded. Any help would be greatly appreciated.
Asked
Active
Viewed 472 times
0
-
1No, its not loaded dynamically, just check the page source inside `script` tag there is a list of all those table elements, just extract that. No need of selenium for this – Stack Oct 25 '17 at 17:55
-
but I don't see why I earn negative sign, maybe for someone it's simple, but I am newbie in all this things. – Sirak Ghazaryan Oct 25 '17 at 19:00
-
It doesnt matter, just keep learning : ) @Sirak Ghazaryan – Stack Oct 26 '17 at 11:11
1 Answers
0
As mentioned by @Stack, the content is not loaded dynamically, it's in the page inside the <script>
tags. You can try something like this:
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
for tr in soup.find_all('tr')[2:]:
tds = tr.find_all('td')
print (tds)
From this question.
Note: this code is untested.

jdoe
- 634
- 5
- 19
-
1thanks, indeed the solution was in using BeautifulSoup, but I also use regexp to fetch needed data. – Sirak Ghazaryan Oct 25 '17 at 19:01
-