I am creating a scatter density diagram for my bachelor thesis. I now have the following problem: I would like to count the respective points in the error zone as well as above and below the error zone. However, the code tells me that there are no points below the error zone. I use the following code to create the error zone and count the points in the error zone
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
# Sort data in ascending order
data1_sorted = data1.sort_values(by='data')
# Use of tau values from the already opened data frame data1_sorted
tau_values = data1_sorted['data'].values * 0.15 + 0.05
# Plotting the expected error zone (only the boundary lines)
ax.plot(tau_values + 0.05, data1_sorted['data'].values, color='green', linestyle='--')
ax.plot(tau_values - 0.05, data1_sorted['data'].values, color='green', linestyle='--')
# Count points within the error zone for data1
points_inside_zone_data1 = (data1['data'].values >= (np.min(tau_values) - 0.05)) & (data1['data'].values <= (np.max(tau_values) + 0.05))
# Count points within the error zone for data2
points_inside_zone_data2 = (data2['data'].values >= (np.min(tau_values) - 0.05)) & (data2['data'].values <= (np.max(tau_values) + 0.05))
# Total number of points within the error zone
total_points_inside_zone = np.sum(points_inside_zone_data1) + np.sum(points_inside_zone_data2)
# Total number of points in both data sets
total_points = len(data1) + len(data2)
# Percentage of points within the error zone
percentage_inside_zone = (total_points_inside_zone / total_points) * 100
# Count points above the error zone for data1
points_above_zone_data1 = data1['data'].values > (np.max(tau_values) + 0.05)
# Count points above the error zone for data2
points_above_zone_data2 = data2['data'].values > (np.max(tau_values) + 0.05)
# Count points below the error zone for data1
points_below_zone_data1 = data1['data'].values < (np.min(tau_values) - 0.05)
# Count points below the error zone for data2
points_below_zone_data2 = data2['data'].values < (np.min(tau_values) - 0.05)
# Total number of points below the error zone
total_points_below_zone = np.sum(points_below_zone_data1) + np.sum(points_below_zone_data2)
# Total number of points above the error zone
total_points_above_zone = np.sum(points_above_zone_data1) + np.sum(points_above_zone_data2)
# Total number of points in both data sets
total_points = len(data1) + len(data2)
#Percentage of points above the error zone
percentage_above_zone = (total_points_above_zone / total_points) * 100
# Percentage of points below the error zone
percentage_below_zone = (total_points_below_zone / total_points) * 100
# Percentage of points within the error zone
percentage_inside_zone = (total_points_inside_zone / total_points) * 100
I looked for the error in my data points and tried to change it somehow (both files are two .csv files with one column for the values. However, both files have different numbers of entries). I checked to see if there were really no points below the zone by displaying the values of the lower ones. From then on, I am at a loss, I hope you can help me find a solution to this problem.