4

I'm working on a regression model visualizer and my current bottleneck has to do with plotting continuous functions. How the program works is that it loads data from a file containing (x, y) values, does the math for the regression model with matrices and then gives the regression models points to ScatterChart for plotting. A single plot point for the regression model is determined by the coefficients of the model and a step which acts as a dx between two original data points.

Essentially, if the dx is set to be 0.1 and x1 = 0, x2 = 10, we will have 98 model points between x1 and x2. If you are interested here's the math for the model: https://newonlinecourses.science.psu.edu/stat501/node/382/

The PolynomialGraph shown in the code is essentially a regression model. It knows how to give original data points until a max value and how to calculate models plot points to that maximum value. All the values are (Double, Double) tuples.

My problem has to do with the fact that depending on how many plot points I calculate, the plot drawn might not be continuous all the way. This is due to the fact that a single plot point has a constant size (determined in CSS), so if the ScatterChart has large width and the dx between two plot points (x1, y1) and (x2, y2) is too large relative to the endpoint of the x axis there will be visible spaces in the plot. The size and shape of original and model's points are in the CSS code below.

An example could be that (x1, y1) = (0, 10) and (x2, y2) = (100, 110) and dx = 100. Now zero points are drawn between (x1, y1) and (x2, y2).

Is there a reasonable way to fix this problem? I've tried my best with ScalaFX and JavaFX API. But I want to believe that there is a better fix than somehow calculating how many plot points will approximately be in a single pixel and that way determining how many points need to be drawn.

Other way could be to find the intervals where dy is the greatest and somehow that way determine how many points is needed.

Other than that I really don't have a clue to fix this problem dynamically. I can always set the dx be around 10E-9 but that doesn't scale well at all. If the range of values in the x axis is massive, we can easily set the dx be even 100. While if the maximum value happens to be one, we may need the dx be around the 10E-9.

package regression

import scalafx.scene.chart.ScatterChart
import scalafx.collections.ObservableBuffer
import scalafx.scene.chart.XYChart
import scalafx.scene.chart.NumberAxis

object Plotter {

  def drawGraph(xName: String, yName: String, graph: PolynomialGraph): ScatterChart[Number, Number] = {
    val xAxis = NumberAxis()
    val yAxis = NumberAxis()
    val pData = XYChart.Series[Number, Number](
      xName,
      ObservableBuffer(graph.dataPointsUntilMaxX().map(z => XYChart.Data[Number, Number](z._2, z._1)): _*)) //Values have to be in order x, y
    val model = XYChart.Series[Number, Number](
      yName,
      ObservableBuffer(graph.calculatePlotPoints().map(z => XYChart.Data[Number, Number](z._1, z._2)): _*))
    val temp = new ScatterChart(xAxis, yAxis, ObservableBuffer(model, pData))
    temp
  }

}
.default-color0.chart-symbol {
    -fx-background-color: blue;
    -fx-background-radius: 0.5px;
    -fx-opacity: 1.0;
    -fx-padding: 1px;
}

.default-color1.chart-symbol {
    -fx-background-color: red, white;
    -fx-background-insets: 0, 2;
    -fx-background-radius: 2px;
    -fx-padding: 3px;
}

Maximum x value of the original data is 23.5

First image shown has a dx of 0.001, meaning that 23500 points are calculated for the models plot. enter image description here

Second image shown has a dx of 0.1. Now only 235 points are calculated for the plot and there is visible space between any two points in the curve. enter image description here

  • 1
    Take a look at [this answer](https://stackoverflow.com/a/26805320/2593574) and see if that is more what you're looking for. – Mike Allen Mar 25 '19 at 12:07
  • @MikeAllen Hmm. A fundamental question that determines whether the solution scales well is how do we determine the _amount_ of points needed for the regression model. Since the regression model is just n order polynomial, one solution is to find the maximum values of the models second derivative, how much the slope changes. We know the range of values from the original data points (x, y), so a safe solution could be the set dx equal the inverse of the maximum value of the second derivative. – Cartesian Bear Mar 25 '19 at 15:55
  • 1
    I'm not sure exactly what you mean. The number of data points required to create a regression line? How the number of data points should be displayed? Whether the chart can handle the number of points? Can you elaborate? – Mike Allen Mar 25 '19 at 15:59
  • 1
    @MikeAllen I added images with descriptions to the post. My problem is the find a scalable way to calculate _enough_ points that the plot _seems_ to be connected. We need only enough points that there is no visible space between any two points. When we know how to change the dx accordingly, we are done. – Cartesian Bear Mar 25 '19 at 16:16
  • 1
    Ah, OK. I see what you're getting at. It seems you would want to identify the minimum and maxmum X & Y values of the data points, giving you the ranges for both axes. You could then round each pair to get the axis bounds. Dividing the ranges by the corresponding number of pixels on that axis for the chart would then give you the X and Y values per pixel. You could just use the X value as the dx value, although vertically steep curves would still pixelate. Let me think about this some more... – Mike Allen Mar 25 '19 at 16:29
  • 1
    @MikeAllen Letting dx be the maximum x value divided by the screen width does the job when the second derivative remains small i.e. when there are no vertically steep curves. My only solution for curves that are steep revolves around some new external library which would allow finding second and third derivative of the regression curve, hence making it possible to calculate more points around the vertically steep intervals. – Cartesian Bear Mar 25 '19 at 19:32
  • 1
    You could use the line chart solution referenced in my first comment, and have the regression line join the dots. Given the resolution of pixels, this should look just fine. This way, you would only need to consider dx as the range per pixel on the X-axis. – Mike Allen Mar 28 '19 at 17:30

0 Answers0