1

I want to get a list of urls for posts from this page and get wanted data from each of them...

import requests
from bs4 import BeautifulSoup    
import selenium.webdriver as webdriver    
url = 'https://www.instagram.com/louisvuitton/'
driver = webdriver.Firefox()
driver.get(url)    
soup = BeautifulSoup(driver.page_source, 'lxml')
data1 = soup.find_all('div', {'class': '_cmdpi'})
list1 =[]
for links in data1:
    list1.append(links.a['href'])
print list1

But why is this getting only the first link rather than a list?

1 Answers1

1

That's because there are multiple links, but only one div with class="+cmdpi"... So data1 is the list that consists of only one element. Try below code to get required references without using bs4:

url = 'https://www.instagram.com/louisvuitton/'
driver = webdriver.Firefox()
driver.get(url) 
links = [a.get_attribute('href') for a in driver.find_elements_by_css_selector('div._cmdpi a')]
print links
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • perfect ! any solution to get more than just 12 results? since load more button doesn't redirect to another page? – Niranga Sithara Aug 27 '17 at 13:19
  • You can click `load more` button once and then [scroll page down](https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python) in a `while` or `for` loop before scraping links – Andersson Aug 27 '17 at 13:26
  • hi....I succeed to get the list of links.And also scraped the wanted data from the post post.But since they all use javascript I have to keep using selenium.that means it will keep opening new web browsers.I want to keep the loop for more than 1000 links. I can't see that working ? any sugesstions?? – Niranga Sithara Aug 27 '17 at 16:44
  • I'm not sure I understood the issue... You can check each page in a loop like `for link in links: driver.get(link)` without opening new browser – Andersson Aug 27 '17 at 16:49