0

I'm kinda new with Scrapy and I've encountered a problem. I'm trying to extract information from this webpage that uses this type of buttons:

<a id="" href="#" ... onclick="function()..."

I've been looking for examples but all of them work with href. Is there a solution? Do I need to use other tools to do the job?

Thanks

2 Answers2

0

No you can't do this by using scrapy but if you want to scrape these type of website then you may use selenium. It is a great library for these javascript pages.

For more detail information about why scrapy doesn't work and why you should use selenium you may visit these link. This is similar to your question. Selenium vs scrapy. You may visit here : Scraping Javascript Enabled Websites using Scrapy-Selenium

imxitiz
  • 3,920
  • 3
  • 9
  • 33
  • The thing is that I'm trying to migrate the code from Selenium because it is way too slow. But the site is full of js interactions. I've seen that there's a library called Splash that deals with the js. However, I haven't been successful on finding what I'm looking for. Thank you for your answer tho :) –  Jul 21 '21 at 11:10
  • I am not aware of `splash` sorry. I can't give you any explanation about it. I have to do my own research about it, learn basics and then only I would be able to write any further explanation. Which I am not interested right now. So, I wrote what I knew and what I found while doing research about your problems. – imxitiz Jul 21 '21 at 13:55
  • Yes! Don't worry :D As soon as I get more information about this I'll update this question with, hopefully, an answer –  Jul 22 '21 at 09:27
0

You cannot "click" a button but you can monitor the network tab to see what request is being sent on click. Take this page for example. When you click the Login button, a POST request is sent. You can easily send POST requests using scrapy. Here is a code snippet:

r = FormRequest.from_response(response, formdata={'username': 'd','password':'x'})
yield(r)
Upendra
  • 716
  • 9
  • 17
  • Sorry, I thought I answered you. The problem here is that is not a form, it's a simple button (navigates to another place in the site) that when I press it, it doesn't seem to have a get/post response in the network tab. However I've reached the conclusion that in the end, I would be building the same tool that I already have, only a bit more efficient by using Scrapy in terms of data extraction, which is not the problem and not the slowest part of the code. Thank you very much tho –  Jul 26 '21 at 11:41