3

I am trying to access this portal via Google Cloud Instance - http://mca.gov.in/. I am not sure if there has been some problematic behaviour from someone else in the past or some other reason but looks like they have blocked every GCP IP address in existence. The website simply doesn't load on it.

I am using selenium and need to setup a way to proxy at browser or server level dynamically. Can you suggest the best way to go about it? I need to download one file every day from this portal. The entire thing done manually takes less than 2 minutes.

The website TOS provided over here permits automated scraping

Acceptable use of MCA Searchable Databases MCA searchable databases are designed to meet the needs of a wide range of users wishing to interrogate our information on-line. Due to limitations of equipment and bandwidth, they are not intended to be a source for bulk downloads.

Individuals, companies, IP addresses or blocks of IP addresses who deny or degrade service to other users by generating unusually high numbers of daily database accesses, whether generated manually or in an automated fashion, may be denied access to these services without notice.

Considering my use case is to download one file everyday, it is not an issue as far as TOS are concerned.

Abhay
  • 87
  • 7
  • I wanted to understand how I can implement the proxy at a browser or server level? – Abhay Feb 21 '21 at 06:50
  • Which browser are you using ? Will it be possible for you to do this with python requests ? – CodeIt Feb 21 '21 at 07:08
  • options.add_argument('--proxy-server=%s' % proxy) to set a proxy we use the following to change it. – Arundeep Chohan Feb 21 '21 at 10:07
  • We are using Chromedriver. I tried this --> options.add_argument('--proxy-server=%s' % proxy) but IP is not changing. It shows the same IP as the system. – Abhay Feb 22 '21 at 05:46
  • There is a similar question here - https://stackoverflow.com/questions/11450158/how-do-i-set-proxy-for-chrome-in-python-webdriver Have you tried those snippets? – Sergiusz Mar 08 '21 at 09:01
  • @Abhay I'm facing the same issue. Did you manage to solve to change the IP from within GCP? '--proxy-server' or using Proxy() and 'capabilities' didn't solve the issue. – Sébastien De Spiegeleer Jul 05 '21 at 14:45

0 Answers0