I am trying to use python (selenium) to extract the data from all the RSPO CREDITS highcharts into a pandas dataframe
with Name of chart, Year, Month, and values (No of credits and Price (USD)) on https://rspo.org/palmtrace and have been looking at some other posts like this and this to do this. However, it looks like these charts are formatted a bit differently so any help with this is much appreciated.
Asked
Active
Viewed 93 times
0

Funkeh-Monkeh
- 649
- 6
- 17
1 Answers
2
Considering your site has two 22-series charts and two 16-series charts, a rough solution would be:
from selenium import webdriver
import time
import pandas as pd
driver = webdriver.Chrome()
website = "https://rspo.org/palmtrace"
driver.get(website)
time.sleep(2)
my_data = []
for chart in range(2):
for series in range(22):
temp = driver.execute_script('return window.Highcharts.charts[{}]'
'.series[{}].options.data'.format(chart,series))
temp.insert(0, driver.execute_script('return window.Highcharts.charts[{}]'
'.series[{}].options.name'.format(chart,series)))
my_data.append(temp)
for chart in range(2,4):
for series in range(16):
temp = driver.execute_script('return window.Highcharts.charts[{}]'
'.series[{}].options.data'.format(chart,series))
temp.insert(0, driver.execute_script('return window.Highcharts.charts[{}]'
'.series[{}].options.name'.format(chart,series)))
my_data.append(temp)
df = pd.DataFrame(my_data)
print(df)

Duc Nguyen
- 301
- 2
- 8
-
Do you know how I can pull the values out into a pandas `dataframe`? I updated my question to reflect this – Funkeh-Monkeh Sep 28 '22 at 19:04
-
Have edited my code now. – Duc Nguyen Sep 28 '22 at 19:10
-
Thanks, but it doesn't include the chart title, year, and month. Do you know how I can have those appended to the `data frame` as well? – Funkeh-Monkeh Sep 28 '22 at 19:21
-
Have added the years. Months are the columns. 1st chart is the first 22 lines, 2nd chart is the next 22 lines, 3rd chart is the next 16 lines, 4th is the rest. – Duc Nguyen Sep 28 '22 at 21:44