-2

I am creating a scatter density diagram for my bachelor thesis. I now have the following problem: I would like to count the respective points in the error zone as well as above and below the error zone. However, the code tells me that there are no points below the error zone. I use the following code to create the error zone and count the points in the error zone

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde

# Sort data in ascending order
data1_sorted = data1.sort_values(by='data')

# Use of tau values from the already opened data frame data1_sorted
tau_values = data1_sorted['data'].values * 0.15 + 0.05

# Plotting the expected error zone (only the boundary lines)
ax.plot(tau_values + 0.05, data1_sorted['data'].values, color='green', linestyle='--')
ax.plot(tau_values - 0.05, data1_sorted['data'].values, color='green', linestyle='--')


# Count points within the error zone for data1
points_inside_zone_data1 = (data1['data'].values >= (np.min(tau_values) - 0.05)) & (data1['data'].values <= (np.max(tau_values) + 0.05))

# Count points within the error zone for data2
points_inside_zone_data2 = (data2['data'].values >= (np.min(tau_values) - 0.05)) & (data2['data'].values <= (np.max(tau_values) + 0.05))

# Total number of points within the error zone
total_points_inside_zone = np.sum(points_inside_zone_data1) + np.sum(points_inside_zone_data2)

# Total number of points in both data sets
total_points = len(data1) + len(data2)

#  Percentage of points within the error zone 
percentage_inside_zone = (total_points_inside_zone / total_points) * 100

# Count points above the error zone for data1
points_above_zone_data1 = data1['data'].values > (np.max(tau_values) + 0.05)

# Count points above the error zone for data2
points_above_zone_data2 = data2['data'].values > (np.max(tau_values) + 0.05)

# Count points below the error zone for data1
points_below_zone_data1 = data1['data'].values < (np.min(tau_values) - 0.05)

# Count points below the error zone for data2
points_below_zone_data2 = data2['data'].values < (np.min(tau_values) - 0.05)

# Total number of points below the error zone
total_points_below_zone = np.sum(points_below_zone_data1) + np.sum(points_below_zone_data2)

# Total number of points above the error zone
total_points_above_zone = np.sum(points_above_zone_data1) + np.sum(points_above_zone_data2)

# Total number of points in both data sets
total_points = len(data1) + len(data2)

#Percentage of points above the error zone
percentage_above_zone = (total_points_above_zone / total_points) * 100

# Percentage of points below the error zone
percentage_below_zone = (total_points_below_zone / total_points) * 100

# Percentage of points within the error zone
percentage_inside_zone = (total_points_inside_zone / total_points) * 100

enter image description here

I looked for the error in my data points and tried to change it somehow (both files are two .csv files with one column for the values. However, both files have different numbers of entries). I checked to see if there were really no points below the zone by displaying the values of the lower ones. From then on, I am at a loss, I hope you can help me find a solution to this problem.

jurb
  • 1
  • 1
  • If you are using libraries like e. g. pandas or matplotlib, add the appropriate tags. – Michael Butscher Aug 15 '23 at 11:05
  • 1
    I'm not clear on what your issue is exactly (also not sure how to read the scatter plot). Do you know that there are points below the error zone that your code is not finding? A minimal reproducible example would be great, see https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples for some guidance. – Andrew McClement Aug 15 '23 at 11:55
  • Okay, let me try to explain. This is a scatter plot and a density plot combined. In the scatter plot, the two green straight lines are my error zone, which I created with the formula EE=+-(0.05+tau*15%) with the first data set. In other words, everything in the data sets that is below -(0.05+tau*15%) should actually be in there. As you can see in the plot, there are points below half the EE. Such plots are often used in aerosol research. – jurb Aug 15 '23 at 13:33

0 Answers0