0

A machine learning model predicted probability p using input x. It is unknown how model calculates the probability.

In the example below,

We have 100 xand p values.

Can someone please show an algorithm to find all values of x for which p is 0.5.

There are two challenges

  1. I don't know the function p = f(x). I don't wish to fit some smooth polynomial curves which will remove the noise. The noises are important.
  2. x values are discrete. So, we need to interpolate to find the desired values of x.
library(tidyverse)

x <- c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0, 2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3.0,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8,3.9,4.0,4.1, 4.2,4.3,4.4,4.5,4.6,4.7,4.8,4.9,5.0,5.1,5.2,5.3,5.4,5.5,5.6,5.7,5.8,5.9,6.0,6.1,6.2, 6.3,6.4,6.5,6.6,6.7,6.8,6.9,7.0,7.1,7.2,7.3,7.4,7.5,7.6,7.7,7.8,7.9,8.0,8.1,8.2,8.3, 8.4,8.5,8.6,8.7,8.8,8.9,9.0,9.1,9.2,9.3,9.4,9.5,9.6,9.7,9.8,9.9, 10.0)
p <- c(0.69385203,0.67153592,0.64868391,0.72205029,0.64917218,0.66818861,0.55532616,0.58631660,0.65013198,0.53695673,0.57401464,0.57812980,0.39889101,0.41922821,0.44022287,0.48610191,0.34235438,0.30877592,0.20408235,0.17221558,0.23667792,0.29237938,0.10278049,0.20981142,0.08563396,0.12080935,0.03266140,0.12362265,0.11210208,0.08364931,0.04746024,0.14754152,0.09865584,0.16588175,0.16581508,0.14036209,0.20431540,0.19971309,0.23336415,0.12444293,0.14120138,0.21566896,0.18490258,0.34261082,0.38338941,0.41828079,0.34217964,0.38137610,0.41641546,0.58767796,0.45473784,0.60015956,0.63484702,0.55080768,0.60981219,0.71217369,0.60736818,0.78073246,0.68643671,0.79230105,0.76443958,0.74410139,0.63418201,0.64126278,0.63164615,0.68326471,0.68154362,0.75890922,0.72917978,0.55839943,0.55452549,0.69419777,0.64160572,0.63205751,0.60118916,0.40162340,0.38523375,0.39309260,0.47021037,0.33391614,0.22400555,0.20929558,0.20003229,0.15848124,0.11589228,0.13326047,0.11848593,0.17024106,0.11184393,0.12506915,0.07740497,0.02548386,0.07381765,0.02610759,0.13271803,0.07034573,0.02549706,0.02503864,0.11621910,0.08636754)


tbl <- tibble(x, p)


# plot for visualization
ggplot(data = tbl,
       aes(x = x,
           y = p)) + 
  geom_line() + 
  geom_point() + 
  geom_hline(yintercept = 0.5) +
  theme_bw() + 
  theme(aspect.ratio = 0.4) 

The figure below shows that there are five roots.

enter image description here

SiH
  • 1,378
  • 4
  • 18
  • 1
    "The noises are important": this statement is hard to believe. And on the plot, I would only consider 3 significant roots. –  Jul 26 '22 at 20:51
  • That is true, practically there should be only three roots. But it is important to demonstrate that model suggested five roots. – SiH Jul 26 '22 at 22:20
  • I am using deeplearning models. Yes that's a good point. I am trying to explore the consequences of the rough fit. – SiH Jul 27 '22 at 02:07
  • "it is important to demonstrate that model suggested five roots": I don't believe for a second that these roots in a random signal are significant (but this is not my business). –  Jul 27 '22 at 06:03

1 Answers1

1

This question is clearer than your previous one: How I use numerical methods to calculate roots in R.

I don't know the function p = f(x)

So you don't have a predict function to calculate p for new x values. This is odd, though. Many statistical models have methods for predict. As BenBolker mentioned, the "obvious" solution is to use uniroot or more automated routines to find a or all roots, for the following template function:

function (x, model, p.target) predict(model, x) - p.target

But this does not work for you. You only have a set of (x, p) values that look noisy.

I don't wish to fit some smooth polynomial curves which will remove the noise. The noises are important.

So we need to interpolate those (x, p) values for a function p = f(x).

So, we need to interpolate to find the desired values of x.

Exactly. The question is what interpolation method to use.

The figure below shows that there are five roots.

This line chart is actually a linear interpolation, consisting of piecewise line segments. To find where it crosses a horizontal line, you can use function RootSpline1 defined in my Q & A back in 2018: get x-value given y-value: general root finding for linear / non-linear interpolation function

RootSpline1(x, p, 0.5)
#[1] 1.243590 4.948805 5.065953 5.131125 7.550705

plot

Thank you very much. Please add the information of how to install the required package. That will help everyone.

This function is not in a package. But this is a good suggestion. I am now thinking of collecting all functions I wrote on Stack Overflow in a package.

The linked Q & A does mention an R package on GitHub: https://github.com/ZheyuanLi/SplinesUtils, but it focuses on splines of higher degree, like cubic interpolation spline, cubic smoothing spline and regression B-splines. Linear interpolation is not dealt with there. So for the moment, you need to grab function RootSpline1 from my Stack Overflow answer.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248