I was looking up about the BM25 algorithm and I have an image related question about how IDF is calculated.
I saw the image below is the IDF difference between BM25 and TF-IDF.
The IDF formula for TF-IDF and the IDF formula for BM25 are shown below.
IDF = Math.log(N / df) // TF-IDF
IDF = Math.log(1 + (N - df + 0.5) / (df + 0.5)) // BM25
It seems that a graph like the image above cannot be produced with the BM25's IDF calculation method. Maybe I'm missing something?
I tried to draw a graph using python.
import matplotlib.pyplot as plt
import math
N = 100
plot_data = []
for df in range(1,17):
idf = math.log(1+(N-df+0.5)/(df+0.5))
plot_data.append(idf)
plt.plot(plot_data, label='BM25_IDF')
plt.legend()
plot_data = []
for df in range(1, 17):
idf = math.log(N/(df+1))
plot_data.append(idf)
plt.plot(plot_data, label='idf')
# plt.plot(idf_list_bias, label='idf')
plt.legend()