6

I am looking to scrape data from this site's mma data and parsing a few highcharts tables. I am clicking a link with selenium and then switching to the chart. I go to this site and click on +420 in the Artem Lobov row for the Pinnacle column. This creates a pop out chart. Then I switch to the active element. I would like to capture the graph drawn by highcharts in response to the click.

I use selenium in the following manner:

actions = ActionChains(driver)
actions.move_to_element(driver.find_element_by_id(pin_id))
actions.click()
actions.perform()
time.sleep(3)
driver.switch_to_active_element()

I was able to click the link and get the chart but I am a bit lost on how highcharts works.
I am trying to parse highcharts-series-group here and get the values in the chart.

I believe the data can be found by:

soup = bs4.BeautifulSoup(open(driver.page_source), "lxml")
data = soup.find_all('g', {"class":"highcharts-series-group"})[-1].find_all("path")

However this provides the following and it it is not clear how a chart is created from the data. As noted in the comments, it appears to be svg.

During inspection the data appears to be in <g class="highcharts-series" and <g class="highcharts-series-tracker but its not clear highcharts graphs it from this data.

How does highcharts display the graph from data saved? Is there a clean way to get the data from the highcharts-series-group as displayed?

Michael WS
  • 2,450
  • 4
  • 24
  • 46
  • Possible duplicate of [Can I scrape the raw data from highcharts.js?](http://stackoverflow.com/questions/39305877/can-i-scrape-the-raw-data-from-highcharts-js) – eli-bd May 01 '17 at 23:09
  • 1
    It is looking like they are storing the data in the dom directly. If you inspect the chart you will see a div with all the data in it as an object you can pull out. The div ID is "even-swing-container". If you want to extract the HTML table of the betting lines that is another question altogether. – wergeld May 01 '17 at 23:10
  • Thank you very much for responding. I was trying to parse the path that i believe is here {"class":"highcharts-series-group"}. It seems to be calling Translate() – Michael WS May 02 '17 at 00:01
  • Take for example click on over/under on https://www.bestfightodds.com/events/ufc-fight-night-108-swanson-vs-lobov-1258 when I inspect in firefox/chrome, it looks like the data is in highcharts-series – Michael WS May 02 '17 at 00:05
  • 1
    Paths rendered by Highcharts use SVG coordinates, not real values. In short: data in JS -> translation in JS from values to SVG coordinates -> rendering SVG elements. In other words, it's not an easy task to get the real data from just SVG coordinates. The easiest way to get this data would be to use.. `Highcharts.charts[index]`, like this: `Highcharts.charts[0].series[0].options.data`. I guess Selenium won't allow this. – Paweł Fus May 02 '17 at 10:05
  • I am confused how would you have access to that – Michael WS May 02 '17 at 12:11
  • Why are you not able to read from the div for id="event-swing-container"? It has all the data in `data-moves` field. The other chart has a div with id="event-outcome-container" that has `data-outcomes` field that contains the data series. I see no reason to decompose the SVG when the real data is right there in the div. You could even take the `data-moves` and `data-outcomes` contents to make your own charts. – wergeld May 05 '17 at 17:46
  • Ah, I see now - you want the popup chart. Looks like the response is coming back encrypted from https://www.bestfightodds.com/api?f=ggd&b=9&m=13467&p=1. So, they probably don't like you scraping it either. – wergeld May 05 '17 at 17:55
  • there's a function that translates that to a path and I have the paths – Michael WS May 05 '17 at 17:59
  • @MichaelWS when you say "I would like to capture the graph drawn by highcharts"; what do you specifically mean by that and what format do you want your captured data to be in? – Kushal Bhalaik May 06 '17 at 06:10
  • I want a x,y data series of date, money line. I really don't care on format other than that – Michael WS May 06 '17 at 19:15

3 Answers3

6

I could not figure out how to convert SVG data into what is displayed on the graph you mentioned, but wrote the following Selenium Python script:

from selenium import webdriver
import time

driver = webdriver.Chrome()
driver.get('https://www.bestfightodds.com/events/ufc-fight-night-108-swanson-vs-lobov-1258')
actions = webdriver.ActionChains(driver)
actions.move_to_element(driver.find_element_by_id('oID1013467091'))
actions.click()
actions.perform()
time.sleep(3)
driver.switch_to_active_element()
chart_number = driver.find_element_by_id('chart-area').get_attribute('data-highcharts-chart')
chart_data = driver.execute_script('return Highcharts.charts[' + chart_number + '].series[0].options.data')
for point in chart_data:
    e = driver.execute_script('return oneDecToML('+ str(point.get('y')) + ')')
    print(point.get('x'), e)

Here we are using Highcharts API and some js from the page sources, that converts server response for this chart to what we see on a graph.

arcquim
  • 1,060
  • 1
  • 14
  • 24
  • amazing. Thank you – Michael WS May 07 '17 at 14:20
  • How did you figure out to use the oneDecToML function? I'm trying to use this on another page and can't figure out how to convert the y-data yet. – wordsforthewise Jun 13 '17 at 18:13
  • @wordsforthewise, well, here I made some kind of client side code research. The page (mentioned in the question and my answer) has some js that converts raw data coming from the server to a format used for drawing highcharts lines. This js contains `oneDecToML` function, and for a graph mentioned in the answer this method is called. I firstly tried do not use any page sources - but had raw data only (the one coming from the server). Then I put a breakpoint on XHR and debugged client code - so I got to know this method was called. The thing your page may not have this function. – arcquim Jun 14 '17 at 05:19
  • Yeah, on my page it didn't have that function, but the data didn't need to be processed any more. Thanks. I didn't know about XHR breakpoints. Any good resources you have for learning that? I found this which looks good: http://blittle.github.io/chrome-dev-tools/network/xhr-breakpoints.html – wordsforthewise Jun 15 '17 at 16:46
  • @wordsforthewise, I do not really think there is something special here to learn about them, it's an easy-to-use tool. So the link you attached covers all the cases to learn :) – arcquim Jun 16 '17 at 05:13
1

Reconstructing data from the svg data list described above using the linear equation y = mx + b from the highcharts chart is another method. If actual data values are known, and datapoints are often displayed on highcharts charts, the slope can be calculated very accurately. Given the intercept is known (see below) I ran a regression on 3 known points and it calculated them precisely (zero error).

Another method described in detail here is reconstructing the data from the highcharts-yaxis-labels but the suitability depends on the data and required accuracy. Extract the y and text values as x and y respectively and run a regression analysis.

y="148"... >-125<
y="117"... >+100<
y="85"... >+120<
y="54"... >+140<
y="23"... >+160<

It is useful to plot the values in a chart, especially with this case because the relationship is not linear. Fortunately discarding the -125 value gives a nice straight line and none of the values are less than 100.

x   y
117 100
85  120
54  140
23  160

x           -0.638938504720592
R^2         0.999938759887726

The bottom x is the line slope so m= -0.638938504720592.

What about the intercept? The most common coordinate system has a bottom left origin but svg uses a top left coordinate system. This means the intercept will have to be adjusted to the top of the chart. The easiest way given this dataset has a value for the top of the chart is to just use the top y as b = 160.

Extract the data list using your preferred method (not described in this answer) and reconstruct the data with the linear equation.

eg ...L 999999 101 ....

y = -0.638938504720592 * 101 + 160 = 95

Reconstructing the data from the y-axis may not be as accurate as using the actual data. If you are lucky the yaxis-labels scale will have a nice scale so you get precise values but it can be up to half a unit out on the top and bottom of the range, so (1/2 + 1/2) / 94 = 1.06% in this example but the error is likely much less.

flywire
  • 1,155
  • 1
  • 14
  • 38
0

When I use the CSS selector "g.highcharts-axis-labels tspan" it returns all the fighter's names and when I use "g.highcharts-data-labels tspan" it returns all the percents for line movement.

So you should be able to use something like

labels = driver.find_elements_by_css_selector("g.highcharts-axis-labels tspan")
data = driver.find_elements_by_css_selector("g.highcharts-data-labels tspan")
for i in range(0, len(labels) - 1)
    print("Fighter: " + labels[i] + " (" + data[i] + ")")

An alternative is to use the command that Pawel Fus recommended,

Highcharts.charts[0].series[0].options.data

You should be able to execute that using JSE and it returns an array of arrays. You can then parse through that and get the data you want. It's up to you...

JeffC
  • 22,180
  • 5
  • 32
  • 55
  • If you click the graph, I don't see tspan anywhere. I see span and it looks like an svg. – Michael WS May 02 '17 at 21:06
  • I was referring to the "Line movement" chart at the bottom of the page which is also a highcharts graph. You need to update your question with more details and specifics on what exactly you are trying to do, etc. – JeffC May 02 '17 at 21:59
  • sorry Jeff for the confusion, I have edited the question for clarity – Michael WS May 02 '17 at 22:17