2

I have Benford test results, test_show

    Expected    Counts  Found   Dif AbsDif  Z_score
Sec_Dig                     
0   0.119679    4318    0.080052    -0.039627   0.039627    28.347781
1   0.113890    2323    0.043066    -0.070824   0.070824    51.771489
2   0.108821    1348    0.024991    -0.083831   0.083831    62.513122
3   0.104330    1298    0.024064    -0.080266   0.080266    60.975864
4   0.100308    3060    0.056730    -0.043579   0.043579    33.683738
5   0.096677    6580    0.121987    0.025310    0.025310    19.884178
6   0.093375    10092   0.187097    0.093722    0.093722    74.804141
7   0.090352    9847    0.182555    0.092203    0.092203    74.687841
8   0.087570    8439    0.156452    0.068882    0.068882    56.587749
9   0.084997    6635    0.123007    0.038010    0.038010    31.646817

I'm trying to plot the Benford result using Plotly as below,

enter image description here

Here is the code that I tried so far

import plotly.graph_objects as go


fig = go.Figure()
fig.add_trace(go.Bar(x=test_show.index,
                y=test_show.Found,
                name='Found',
                marker_color='rgb(55, 83, 109)',
                # color="color"
                ))
fig.add_trace(go.Scatter(x=test_show.index,
                y=test_show.Expected,
                mode='lines+markers',
                name='Expected'
                ))

fig.update_layout(
    title='Benfords Law',
    xaxis=dict(
        title='Digits',
        tickmode='linear',
        titlefont_size=16,
        tickfont_size=14),
    yaxis=dict(
        title='% Percentage',
        titlefont_size=16,
        tickfont_size=14,
    ),
    legend=dict(
        x=0,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ))
fig.show()

How to add the confidence interval to the plot for test_show["Expected"]?

Ailurophile
  • 2,552
  • 7
  • 21
  • 46

1 Answers1

4

As of Python 3.8 you can use NormalDist to calculate a confidence interval as explained in detail here. With a slight adjustment to that approach you can include it in your setup with fig.add_traces() using two go.Scatter() traces, and then set fill='tonexty', fillcolor = 'rgba(255, 0, 0, 0.2)') for the last one like this:

CI = confidence_interval(df.Expected, 0.95)
fig.add_traces([go.Scatter(x = df.index, y = df['Expected']+CI,
                           mode = 'lines', line_color = 'rgba(0,0,0,0)',
                           showlegend = False),
                go.Scatter(x = df.index, y = df['Expected']-CI,
                           mode = 'lines', line_color = 'rgba(0,0,0,0)',
                           name = '95% confidence interval',
                           fill='tonexty', fillcolor = 'rgba(255, 0, 0, 0.2)')])

Please not that this approach calculates a confidence interval from the very limited df.Expected series. And that might not be what you're looking to do here. So let me know how this initial suggestion works out for you and then we can take it from there.

Plot

enter image description here

Complete code:

import plotly.graph_objects as go
import pandas as pd
from statistics import NormalDist

def confidence_interval(data, confidence=0.95):
  dist = NormalDist.from_samples(data)
  z = NormalDist().inv_cdf((1 + confidence) / 2.)
  h = dist.stdev * z / ((len(data) - 1) ** .5)
  return h


df = pd.DataFrame({'Expected': {0: 0.119679,
                      1: 0.11389,
                      2: 0.108821,
                      3: 0.10432999999999999,
                      4: 0.10030800000000001,
                      5: 0.096677,
                      6: 0.093375,
                      7: 0.090352,
                      8: 0.08757000000000001,
                      9: 0.084997},
                     'Counts': {0: 4318,
                      1: 2323,
                      2: 1348,
                      3: 1298,
                      4: 3060,
                      5: 6580,
                      6: 10092,
                      7: 9847,
                      8: 8439,
                      9: 6635},
                     'Found': {0: 0.080052,
                      1: 0.043066,
                      2: 0.024991,
                      3: 0.024064,
                      4: 0.056729999999999996,
                      5: 0.12198699999999998,
                      6: 0.187097,
                      7: 0.182555,
                      8: 0.156452,
                      9: 0.12300699999999999},
                     'Dif': {0: -0.039626999999999996,
                      1: -0.070824,
                      2: -0.08383099999999999,
                      3: -0.08026599999999999,
                      4: -0.043579,
                      5: 0.02531,
                      6: 0.093722,
                      7: 0.092203,
                      8: 0.068882,
                      9: 0.03801},
                     'AbsDif': {0: 0.039626999999999996,
                      1: 0.070824,
                      2: 0.08383099999999999,
                      3: 0.08026599999999999,
                      4: 0.043579,
                      5: 0.02531,
                      6: 0.093722,
                      7: 0.092203,
                      8: 0.068882,
                      9: 0.03801},
                     'Z_scoreSec_Dig': {0: 28.347781,
                      1: 51.771489,
                      2: 62.513121999999996,
                      3: 60.975864,
                      4: 33.683738,
                      5: 19.884178,
                      6: 74.804141,
                      7: 74.687841,
                      8: 56.587749,
                      9: 31.646817}})

test_show = df
fig = go.Figure()
fig.add_trace(go.Bar(x=test_show.index,
                y=test_show.Found,
                name='Found',
                marker_color='rgb(55, 83, 109)',
                # color="color"
                ))
fig.add_trace(go.Scatter(x=test_show.index,
                y=test_show.Expected,
                mode='lines+markers',
                name='Expected'
                ))

fig.update_layout(
    title='Benfords Law',
    xaxis=dict(
        title='Digits',
        tickmode='linear',
        titlefont_size=16,
        tickfont_size=14),
    yaxis=dict(
        title='% Percentage',
        titlefont_size=16,
        tickfont_size=14,
    ),
    legend=dict(
        x=0,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ))

CI = confidence_interval(df.Expected, 0.95)

fig.add_traces([go.Scatter(x = df.index, y = df['Expected']+CI,
                           mode = 'lines', line_color = 'rgba(0,0,0,0)',
                           showlegend = False),
                go.Scatter(x = df.index, y = df['Expected']-CI,
                           mode = 'lines', line_color = 'rgba(0,0,0,0)',
                           name = '95% confidence interval',
                           fill='tonexty', fillcolor = 'rgba(255, 0, 0, 0.2)')])

fig.show()
vestland
  • 55,229
  • 37
  • 187
  • 305
  • Thanks for the answer. Is `mean_confidence_interval` and `confidence_interval` are same? – Ailurophile Nov 23 '21 at 17:32
  • @Pluviophile You're welcome! I'm not quite sure what you mean though.... Where is mean_confidence_interval? – vestland Nov 23 '21 at 17:37
  • Since I'm using python 3.7, I can not use `NormalDist`, just went through the link you shared in the answer, there in the accepted answer found mean_confidence_interval – Ailurophile Nov 23 '21 at 17:56
  • I'm not sure if I can use that to calculate confidence intervals – Ailurophile Nov 23 '21 at 17:57
  • @Pluviophile Ah, I see.... Not on the PC right now, but give it a try and compare the numbers. I'll take a closer look later tonight or tomorrow – vestland Nov 23 '21 at 18:04
  • @Pluviophile I'm getting `0.007557` in your example. And then of course I'm adding and subtracting that to every observation of `df.Expected` to illustrate the interval,. – vestland Nov 23 '21 at 18:06
  • Using `mean_confidence_interval` I got `m, m-h, m+h = (0.09999999999999996, 0.09172440121778236, 0.10827559878221757)` – Ailurophile Nov 24 '21 at 07:02
  • We got different results for confidence intervals – Ailurophile Nov 24 '21 at 07:03
  • @Pluviophile And you are using the very same data sample as in your question? – vestland Nov 24 '21 at 07:12
  • Yes, the same sample I added in the question – Ailurophile Nov 24 '21 at 07:24
  • 1
    @Pluviophile Then the difference has to be caused by `n` vs `n-1` as per the comment `This assumes the sample size is big enough (let's say more than ~100 points) in order to use the standard normal distribution rather than the student's t distribution to compute the z value.` – vestland Nov 24 '21 at 07:33