And selenium version:
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
import time
driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe')
url = 'https://www.privateequityinternational.com/database/#/pei-300'
driver.get(url) #gets info from page
time.sleep(5)
page = driver.page_source
driver.close()
soup = BeautifulSoup(page,'html.parser') #parses information
table = soup.select_one('table.au-target.pux--responsive-table') #collecting blocks of info inside of table
dfs = pd.read_html(table.prettify())
df = pd.concat(dfs)
df.to_csv('file.csv')
print(df.head(25))
prints:
Ranking Name City, Country (HQ)
0 1 Blackstone New York, United States
1 2 The Carlyle Group Washington DC, United States
2 3 KKR New York, United States
3 4 TPG Fort Worth, United States
4 5 Warburg Pincus New York, United States
5 6 NB Alternatives New York, United States
6 7 CVC Capital Partners Luxembourg, Luxembourg
7 8 EQT Stockholm, Sweden
8 9 Advent International Boston, United States
9 10 Vista Equity Partners Austin, United States
10 11 Leonard Green & Partners Los Angeles, United States
11 12 Cinven London, United Kingdom
12 13 Bain Capital Boston, United States
13 14 Apollo Global Management New York, United States
14 15 Thoma Bravo San Francisco, United States
15 16 Insight Partners New York, United States
16 17 BlackRock New York, United States
17 18 General Atlantic New York, United States
18 19 Permira Advisers London, United Kingdom
19 20 Brookfield Asset Management Toronto, Canada
20 21 EnCap Investments Houston, United States
21 22 Francisco Partners San Francisco, United States
22 23 Platinum Equity Beverly Hills, United States
23 24 Hillhouse Capital Group Hong Kong, Hong Kong
24 25 Partners Group Baar-Zug, Switzerland
And also save data to a file.csv
.
Note yo need selenium and geckodriver and in this code geckodriver is set to be imported from c:/program/geckodriver.exe