1

There's the website Forbes Most Admired Companies with a list of 50 companies and I am trying to parse that list and export it into a csv file

The code I have only get me 20 because the page load when you scroll down. is there a way to simulate the scroll down or make it load entirely?

from lxml import html
import requests

def schindler(max): # create a list of the companies
    page = requests.get('http://beta.fortune.com/worlds-most-admired-companies/list/')
    tempContainer = html.fromstring(page.content)
    names = []
    position = 1

    while position <= max:
        names.extend(tempContainer.xpath('//*[@id="pageContent"]/div[2]/div/div/div[1]/div[1]/ul/li['+str(position)+']/a/span[2]/text()'))
        position = position + 1

    return names

(That was only the list creation, no problem with the .csv exporter)

I then print it to chek and only 20 items appear in the list

print(schindler(50))
Tony
  • 9,672
  • 3
  • 47
  • 75
Trotus
  • 35
  • 4
  • 1
    I'm going to say this is a duplicate of [scrape websites with infinite scrolling](http://stackoverflow.com/questions/12519074/scrape-websites-with-infinite-scrolling) but know this is crossing from "read static webpage" to "interact with webpage" which is an unfortunately large step. – Tadhg McDonald-Jensen Mar 11 '17 at 20:24
  • Thanks! I will check it, I tried looking for a preexsiting answer and didnt find anything usable. Probably because I lack the technical vocabulary to propperly state my problem – Trotus Mar 11 '17 at 20:45
  • the three relevent words were [python, scraping, and scrolling](https://www.google.ca/search?client=safari&rls=en&q=python+scaping+scrolling&ie=UTF-8&oe=UTF-8&gfe_rd=cr&ei=KGLEWJSuL6aC8QeHyoGwDQ#q=python+scraping+scrolling&*) was it _scraping_ you were missing? (btw I like your reference to galdolf) – Tadhg McDonald-Jensen Mar 11 '17 at 20:47

1 Answers1

1

It would appear that you're able to fetch the data as JSON. The 20 in the url appears to be the rank at which to start and 30 the number of items.

Sample code:

url = "http://fortune.com/api/v2/list/1918408/expand/item/ordering/asc/20/30"

resp = requests.get(url)
for entry in resp.json()['list-items']:
    print(entry['rank'], entry['name'])                                                                                                                                                                                                                                                                  
Peter Gerber
  • 1,023
  • 10
  • 13