I am reading a research paper and trying to replicate the results. The paper shows distribution shift in plots, and The diagram looks like these:
Here is the full paper.
What I have tried:
I am trying to replicate the result using data to better understand the topic. Here is the data from the paper
data = {'wiki': {'squad': [94.9, 94.9, 95.6, 95.1, 94.9, 94.9, 94.7, 94.6, 94.4, 93.4, 93.3, 93.1, 92.7, 92.6, 92.6, 92.0, 92.0, 91.9, 91.8, 91.8, 91.6, 91.4, 91.4, 91.3, 91.3, 91.1, 90.9, 90.6, 89.8, 89.5, 89.2, 88.9, 88.9, 88.6, 88.5, 88.2, 88.1, 88.1, 88.0, 87.8, 87.7, 87.4, 87.3, 86.9, 86.8, 86.7, 86.7, 86.7, 86.0, 85.9, 85.8, 85.5, 85.5, 85.5, 85.5, 85.3, 85.3, 85.1, 84.9, 84.9, 84.9, 84.7, 84.6, 84.5, 84.2, 83.9, 83.9, 83.3, 83.1, 82.8, 82.8, 82.7, 82.7, 82.6, 82.5, 82.4, 81.9, 81.9, 81.8, 81.0, 81.0, 80.5, 80.5, 80.2, 80.1, 79.9, 79.8, 79.8, 79.5, 79.4, 78.7, 78.2, 77.2, 77.0, 76.8, 76.4, 76.3, 74.6, 74.6, 73.7, 73.1, 72.0, 72.0, 71.4, 71.0, 64.0, 54.7], 'wiki': [92.5, 92.4, 92.3, 92.3, 92.2, 92.3, 92.2, 92.3, 91.8, 91.5, 91.0, 91.0, 90.8, 90.4, 90.6, 89.7, 89.6, 89.4, 89.6, 89.4, 89.4, 89.2, 89.2, 90.8, 89.6, 89.3, 88.7, 88.1, 88.0, 85.6, 87.0, 87.0, 86.8, 86.5, 86.5, 86.1, 85.7, 86.2, 85.8, 86.8, 86.2, 85.1, 85.9, 85.1, 85.0, 85.6, 85.1, 84.7, 84.3, 83.8, 83.8, 83.7, 83.1, 83.7, 84.4, 83.9, 82.3, 84.1, 83.8, 83.4, 83.4, 83.3, 83.3, 82.9, 83.1, 82.3, 82.5, 81.8, 81.8, 80.1, 81.6, 81.9, 81.4, 81.3, 81.9, 81.6, 81.2, 80.0, 80.2, 79.8, 80.0, 80.3, 79.4, 78.8, 79.5, 79.2, 78.8, 79.0, 78.9, 78.4, 77.2, 77.4, 76.5, 76.6, 76.8, 75.6, 76.6, 74.7, 74.9, 73.4, 73.1, 72.5, 72.1, 67.6, 70.6, 62.1, 54.1]}, 'new_york': {'squad': [94.9, 94.9, 95.6, 95.1, 94.9, 94.9, 94.7, 94.6, 94.4, 93.4, 93.3, 93.1, 92.7, 92.6, 92.6, 92.0, 92.0, 91.9, 91.8, 91.8, 91.6, 91.4, 91.4, 91.3, 91.3, 91.1, 90.9, 90.6, 89.8, 89.5, 89.2, 88.9, 88.9, 88.6, 88.5, 88.2, 88.1, 88.1, 88.0, 87.8, 87.8, 87.7, 87.4, 87.3, 86.9, 86.8, 86.7, 86.7, 86.7, 86.0, 85.9, 85.8, 85.5, 85.5, 85.5, 85.3, 85.3, 85.1, 84.9, 84.9, 84.9, 84.7, 84.6, 84.5, 84.2, 83.9, 83.9, 83.3, 83.1, 82.8, 82.8, 82.7, 82.7, 82.6, 82.5, 82.4, 81.9, 81.9, 81.8, 81.0, 81.0, 80.5, 80.5, 80.2, 80.1, 79.9, 79.8, 79.8, 79.5, 79.4, 78.7, 78.2, 77.2, 77.0, 76.8, 76.4, 76.3, 74.6, 74.6, 73.7, 73.1, 72.0, 72.0, 71.4, 71.0, 64.0, 54.7], 'yt': [95.0, 96.3, 93.7, 84.4, 92.8, 92.9, 93.4, 92.4, 89.4, 91.7, 90.8, 91.1, 90.6, 88.3, 90.5, 88.8, 88.9, 88.9, 88.5, 88.4, 88.3, 88.6, 88.6, 90.6, 88.6, 88.2, 88.3, 87.4, 86.2, 83.3, 85.1, 86.0, 84.9, 85.8, 84.1, 84.7, 84.0, 84.0, 84.8, 84.3, 86.1, 86.1, 84.0, 83.7, 83.8, 81.9, 83.5, 83.5, 82.6, 82.9, 82.7, 82.7, 82.6, 79.9, 81.0, 80.7, 78.7, 82.1, 81.0, 81.6, 81.4, 80.2, 80.3, 79.8, 80.8, 78.4, 78.9, 80.3, 77.0, 77.4, 78.8, 79.5, 78.1, 78.5, 78.8, 78.5, 77.6, 75.8, 77.0, 77.5, 77.9, 78.1, 76.5, 74.9, 76.2, 77.2, 75.9, 75.1, 76.3, 75.0, 74.3, 75.7, 73.8, 71.9, 73.5, 73.0, 73.0, 71.3, 71.4, 68.8, 68.1, 67.0, 66.8, 68.7, 68.4, 60.2, 51.7]}, 'reddit': {'squad': [94.9, 94.9, 95.6, 95.1, 84.2, 10.1, 85.4, 83.6, 93.3, 81.8, 11.5, 81.5, 81.4, 91.9, 77.5, 11.5, 78.5, 79.2, 91.3, 80.2, 12.8, 77.9, 74.3, 88.9, 74.8, 11.8, 75.5, 73.9, 88.0, 74.0, 10.3, 74.5, 75.5, 86.8, 74.2, 13.8, 72.0, 73.4, 85.5, 71.6, 12.9, 71.8, 71.6, 84.7, 68.7, 13.5, 69.1, 71.5, 82.8, 67.6, 16.1, 68.3, 66.8, 81.9, 65.0, 14.9, 64.7, 59.8, 79.9, 61.4, 18.8, 62.5, 64.0, 77.0, 59.5, 18.5, 57.5, 61.3, 72.0, 50.5, 10.7, 49.8], 'reddit': [93.8, 93.8, 94.5, 79.0, 85.5, 94.7, 84.9, 11.4, 82.2, 83.1, 92.6, 80.9, 11.3, 80.1, 78.9, 91.6, 77.8, 12.9, 80.7, 81.5, 90.6, 77.2, 16.0, 76.6, 76.3, 88.5, 74.8, 15.0, 75.4, 75.5, 87.7, 73.8, 12.6, 74.6, 75.8, 86.7, 71.2, 13.3, 72.0, 73.2, 85.3, 71.0, 14.1, 69.4, 70.3, 83.9, 68.3, 12.6, 67.7, 69.2, 82.6, 67.5, 16.4, 62.7, 66.6, 80.5, 63.8, 21.2, 64.7, 63.1, 79.5, 61.7, 15.1, 64.8, 61.2, 74.6, 56.6, 13.3, 52.8, 52.3, 64.0, 48.9]}, 'amazon': {'squad': [88.1, 88.1, 88.0, 87.8, 87.8, 87.7, 87.4, 87.3, 86.9, 86.9, 86.8, 86.7, 86.7, 86.7, 86.0, 86.0, 85.9, 85.8, 85.5, 85.5, 85.5, 85.5, 85.5, 85.3, 85.3, 85.1, 84.9, 84.9, 84.9, 84.7, 84.6, 84.5, 84.2, 83.9, 83.9, 83.5, 83.3, 83.1, 82.8, 82.8, 82.7, 82.7, 82.6, 82.5, 82.4, 81.9, 81.9, 81.8, 81.7, 81.5, 81.0, 81.0, 80.5, 88.1, 88.1, 88.0, 87.8, 87.8, 87.7, 87.4, 87.3, 86.9, 86.9, 86.8, 86.7, 86.7, 86.7, 86.0, 86.0, 85.9, 85.8, 85.5, 85.5, 85.5, 85.5, 85.5, 85.3, 85.3, 85.1, 84.9, 84.9, 84.9, 84.7, 84.6, 84.5, 84.2, 83.9, 83.9, 83.5, 83.3, 83.1, 82.8, 82.8, 82.7, 82.7, 82.6, 82.5, 82.4, 81.9, 81.9, 81.8, 81.7, 81.5, 81.0, 81.0, 80.5, 80.5, 80.2, 80.1, 79.9, 79.8, 79.8, 79.5, 79.4, 78.7, 78.2, 77.8, 77.2, 77.0, 76.8, 76.4, 76.3, 74.6, 74.6, 73.7, 73.1, 72.0, 72.0, 71.4, 71.0, 64.0, 54.7], 'amazon': [69.4, 70.0, 74.2, 73.6, 75.7, 75.4, 70.8, 72.1, 70.6, 70.6, 70.3, 72.0, 71.8, 66.6, 68.6, 67.7, 69.2, 69.1, 67.7, 69.3, 62.3, 68.7, 70.3, 69.0, 63.0, 68.2, 66.6, 69.0, 68.7, 64.3, 64.2, 66.7, 66.6, 63.0, 66.0, 62.2, 67.7, 62.9, 64.1, 62.2, 63.6, 61.6, 62.7, 65.5, 60.1, 59.9, 60.1, 64.9, 59.9, 58.8, 61.2, 61.5, 64.2, 69.4, 70.0, 74.2, 73.6, 75.7, 75.4, 70.8, 72.1, 70.6, 70.6, 70.3, 72.0, 71.8, 66.6, 68.6, 67.7, 69.2, 69.1, 67.7, 69.3, 62.3, 68.7, 70.3, 69.0, 63.0, 68.2, 66.6, 69.0, 68.7, 64.3, 64.2, 66.7, 66.6, 63.0, 66.0, 62.2, 67.7, 62.9, 64.1, 62.2, 63.6, 61.6, 62.7, 65.5, 60.1, 59.9, 60.1, 64.9, 59.9, 58.8, 61.2, 61.5, 64.2, 60.1, 56.9, 56.9, 60.1, 57.8, 56.7, 58.9, 57.8, 57.6, 57.8, 57.6, 56.2, 59.0, 54.7, 54.7, 54.7, 54.4, 53.7, 55.6, 49.4, 47.0, 46.5, 59.5, 51.7, 46.0, 45.0]}}
Here is my code for y = x
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
x = data['reddit']['squad']
y = data['reddit']['reddit']
#create lineplot
ax = sns.lineplot(x, y)
And here is the linear fit plot:
ax = sns.regplot(x, y)
But the result doesn't look satisfactory, as shown in the paper; how I can visualize the data in exact same way along with other data (Model f1
, Human F1
) to replicate the graph as it is?
I want to analyze the distribution shift as shown in the paper. The code I used doesn't look good. Are there better ways to do the same analysis on the above data?