website list parser only retrieves 20 items, how to make the website load more

Question

There's the website Forbes Most Admired Companies with a list of 50 companies and I am trying to parse that list and export it into a csv file

The code I have only get me 20 because the page load when you scroll down. is there a way to simulate the scroll down or make it load entirely?

from lxml import html
import requests

def schindler(max): # create a list of the companies
    page = requests.get('http://beta.fortune.com/worlds-most-admired-companies/list/')
    tempContainer = html.fromstring(page.content)
    names = []
    position = 1

    while position <= max:
        names.extend(tempContainer.xpath('//*[@id="pageContent"]/div[2]/div/div/div[1]/div[1]/ul/li['+str(position)+']/a/span[2]/text()'))
        position = position + 1

    return names

(That was only the list creation, no problem with the .csv exporter)

I then print it to chek and only 20 items appear in the list

print(schindler(50))

I'm going to say this is a duplicate of [scrape websites with infinite scrolling](http://stackoverflow.com/questions/12519074/scrape-websites-with-infinite-scrolling) but know this is crossing from "read static webpage" to "interact with webpage" which is an unfortunately large step. — Tadhg McDonald-Jensen, Mar 11 '17 at 20:24
Thanks! I will check it, I tried looking for a preexsiting answer and didnt find anything usable. Probably because I lack the technical vocabulary to propperly state my problem — Trotus, Mar 11 '17 at 20:45
the three relevent words were [python, scraping, and scrolling](https://www.google.ca/search?client=safari&rls=en&q=python+scaping+scrolling&ie=UTF-8&oe=UTF-8&gfe_rd=cr&ei=KGLEWJSuL6aC8QeHyoGwDQ#q=python+scraping+scrolling&*) was it _scraping_ you were missing? (btw I like your reference to galdolf) — Tadhg McDonald-Jensen, Mar 11 '17 at 20:47

score 1 · Accepted Answer · answered Mar 11 '17 at 22:38

It would appear that you're able to fetch the data as JSON. The 20 in the url appears to be the rank at which to start and 30 the number of items.

Sample code:

url = "http://fortune.com/api/v2/list/1918408/expand/item/ordering/asc/20/30"

resp = requests.get(url)
for entry in resp.json()['list-items']:
    print(entry['rank'], entry['name'])

Awesome, I will need to format the entry['name'] to remove the numbers at the end — Trotus, Mar 12 '17 at 00:27

website list parser only retrieves 20 items, how to make the website load more

1 Answers1