How to Get a redirected URL

Question

Hi guys i am trying to solve this and i don't really know what to do. I scraped this website https://www.financialjuice.com/home and saved it to my database and it did worked successfully.

But the issue i have is if a scraped item is clicked on my app, it firsts gets to financial juice first before going to the main source of the news

That is on financial juice they might have a new they got from BBC and my scrapy takes in that item, once you click on the url, it firsts gets to financial juice first before going to BBC

What do you think i can do please your suggestion is welcomed.

Your question is still a little unclear, what exactly is the issue? — information_interchange, Nov 19 '17 at 04:01
I want to be able to get the link it's redirected to straight away instead of first visiting financial juice before getting to the actual news source — molecules, Nov 19 '17 at 04:04
If you check the financial juice you will notice before the news source came up, there was a loading on financial juice before it finally brought the source up. — molecules, Nov 19 '17 at 04:11

kmcodes · Answer 1 · 2017-11-20T06:39:10.067

0

Share one of the scraped URL's but what I assume is the problem is that financial juice is not giving you the direct url but one with redirection. So basically this is a link on front page

https://www.financialjuice.com/News/3772381/A-week-end-of-decision-for-Germany.aspx

which loads rthen redirects to

http://www.forexlive.com/news/!/a-week-end-of-decision-for-germany-20171118

Helps them keep track of which links were visited from outside the website (social media sharing etc) and prevent exactly what you have done.

You will need to run a script to visit the link and then get the url after the last redirection.

for example using urllib2. The geturl gives you the final url of the opened object.

finalurl = urllib2.urlopen(intialurl, None, 1).geturl()

If the redirecction is with a script then you need to use Selenium. See here for a good example. I modified the below code for you and it worked quite well

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
chromepath='/usr/bin/chromedriver' #//change this to your chromedriver path
driver = webdriver.Chrome(chromepath)
driver.get('https://www.financialjuice.com/News/3772381/A-week-end-of-decision-for-Germany.aspx')


time.sleep(10)
print(driver.current_url)

driver.quit()

edited Nov 20 '17 at 06:39

answered Nov 19 '17 at 04:42

kmcodes

807
1
8
20

how can this be achieved – molecules Nov 19 '17 at 06:01
Added content to the end of my earlier answer to help others. See original answer and mark it as "accepted answer" if it helps. Thanks. – kmcodes Nov 19 '17 at 06:06
This is redirected by script. you can not get it without browser. – Rahul Nov 19 '17 at 06:15
If thats true, then we need to use selenium. Added instructions above. – kmcodes Nov 19 '17 at 06:21
Please check the code added above, it can be repurposed for you use (you will need to give your chromedriver path. While scraping you needed to add a delay for the js redirect to work. – kmcodes Nov 20 '17 at 06:40

How to Get a redirected URL

1 Answers1