0

I would like to plot my data similar to the following figure with showing median in each bin and 25 and 75 percent value.[The solid line and open circles show the median values in each bin, and the broken lines show the 25% and 75% values.] enter image description here

I have this sample data. And I did like this to get the similar plot

import numpy as np
import matplotlib.pyplot as plt
from astropy.table import Table
data=Table.read('sample_data.fits')
# Sample data
X=data['density']
Y=data['lineflux']
total_bins = 15
bins = np.linspace(min(X),max(X), total_bins)
delta = bins[1]-bins[0]
idx  = np.digitize(X,bins)
running_median = [np.median(Y[idx==k]) for k in range(total_bins)]

plt.plot(X,Y,'.')
plt.plot(bins-delta/2,running_median,'--r',marker='o',fillstyle='none',markersize=20,alpha=1)
plt.xlabel('log $\delta_{5th}[Mpc^{-3}]$')
plt.ylabel('log OII[flux]')
plt.loglog()
plt.axis('tight')
plt.show()

And I got this plot. enter image description here

There is a large offset. I change the size of the bin also, still, I got the large offset. How to plot in the correct way and how to include the 25 and 75 percent value like the previous figure in my plot.

John Singh
  • 79
  • 1
  • 2
  • 10
  • I think your regular bins are the problem. Take a look at https://stackoverflow.com/questions/6855710/how-to-have-logarithmic-bins-in-a-python-histogram, this will solve the offset issue. – GrimTrigger Dec 17 '20 at 10:20

1 Answers1

1

To also answer the other question: you can use np.percentile. I had to lower the bin number (there was a bin without data, this leads to problems with the percentile). For the logarithmic bins see my comment above:

import numpy as np
import matplotlib.pyplot as plt
from astropy.table import Table

data=Table.read('sample_data.fits')
# Sample data
X=data['density']
Y=data['lineflux']
total_bins = 10
#bins = np.linspace(min(X), max(X), total_bins)
bins = np.logspace(np.log10(0.0001), np.log10(0.1), total_bins)
delta = bins[1]-bins[0]
idx  = np.digitize(X, bins)
running_median = [np.median(Y[idx==k]) for k in range(total_bins)]

running_prc25 = [np.percentile(Y[idx==k], 25) for k in range(total_bins)]
running_prc75 = [np.percentile(Y[idx==k], 75) for k in range(total_bins)]

plt.plot(X,Y,'.')
plt.plot(bins-delta/2,running_median,'-r',marker='o',fillstyle='none',markersize=20,alpha=1)

plt.plot(bins-delta/2,running_prc25,'--r',marker=None,fillstyle='none',markersize=20,alpha=1)
plt.plot(bins-delta/2,running_prc75,'--r',marker=None,fillstyle='none',markersize=20,alpha=1)

plt.xlabel('log $\delta_{5th}[Mpc^{-3}]$')
plt.ylabel('log OII[flux]')
plt.loglog()
plt.axis('tight')
plt.show()

which produces

enter image description here

EDIT:

To show a filled plot you may try (just relevant section shown):

fig, ax = plt.subplots()

plt.plot(X,Y,'.')
plt.plot(bins-delta/2,running_median,'-r',marker='o',fillstyle='none',markersize=20,alpha=1)

#plt.plot(bins-delta/2,running_prc25,'--r',marker=None,fillstyle='none',markersize=20,alpha=1)
#plt.plot(bins-delta/2,running_prc75,'--r',marker=None,fillstyle='none',markersize=20,alpha=1)

ax.fill_between(bins-delta/2,running_prc25,running_median, facecolor='orange')
ax.fill_between(bins-delta/2,running_prc75,running_median, facecolor='orange')

which produces

enter image description here

GrimTrigger
  • 571
  • 1
  • 5
  • 10
  • How to know the size of the bin according to the data. Suppose I have another density which minimum value is 0.010651032198077923 and the maximum value is 6.182012487319087. I cannot apply these bins. What will be the appropriate bins for this. And how to know which will be good size. Thank you – John Singh Dec 17 '20 at 11:58
  • 1
    Thats a good question: 'np.logspace' allows you to set the lower and upper limit of the intervals. I just eyeballed it by looking at the graph but you can just set np.log10(lower_limit) to np.log10(upper_limit). As for the size, this depends for me on the underlying data and if it is meaningful what you are showing. I would start with a binsize, look how many datapoints there are in each bin (it is hard to recommend a general approach but as you want a median and percentiles there should be 10 or more per bin). Looking at the graph I would only interpret the range 10^-4 to maybe 3*10^-3. – GrimTrigger Dec 17 '20 at 12:50
  • 1
    to continue: and maybe just draw the curve there. The graph you showed at the beginning did it the same way: it only shows the curve in the middle range where are enough datapoints. – GrimTrigger Dec 17 '20 at 12:52
  • How to plot fill between two percentile – John Singh Feb 01 '21 at 13:18
  • 1
    you can try something like `ax.fill_between(bins-delta/2,running_prc25,running_median)` and `ax.fill_between(bins-delta/2,running_prc75,running_median)`(if I understand the question right). Just add `fig, ax = plt.subplots()` before. – GrimTrigger Feb 01 '21 at 13:51
  • added to question – GrimTrigger Feb 01 '21 at 14:11
  • I tried to plot the error bar in each median bins, but I am not able to do it. Kindly do help. Thank you so much – John Singh Feb 16 '21 at 05:42
  • 1
    I guess you could add `running_std = [np.std(Y[idx==k]) for k in range(total_bins)] plt.errorbar(bins-delta/2,running_median, running_std)` before the `ax_fill...` – GrimTrigger Feb 16 '21 at 11:47