I have a very huge data set of 18000 players. Every player has a feature Overall and Finishing for example, and I want to make scatter density plot because with "normal" plot I can't where is more and where is less players.
Normal scatter plot code looks like this...
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
DATA_LOCATION = "main_players.csv"
FIRST_ATTRIBUTE = "Finishing"
SECOND_ATTRIBUTE = "Overall"
datas = pd.read_csv(DATA_LOCATION)
x = datas[[FIRST_ATTRIBUTE]]
y = datas[[SECOND_ATTRIBUTE]]
plt.scatter(x, y, color='r')
plt.xlabel('Finishing')
plt.ylabel('Overall')
plt.title('Odnos overall i finishinga')
plt.show()
I look on the Internet and I find a code to do the following:
# libraries
import matplotlib.pyplot as plt
import numpy as np
# create data
x = np.random.normal(size=50000)
y = x * 3 + np.random.normal(size=50000)
# Big bins
plt.hist2d(x, y, bins=(50, 50), cmap=plt.cm.jet)
#plt.show()
# Small bins
plt.hist2d(x, y, bins=(300, 300), cmap=plt.cm.jet)
#plt.show()
# If you do not set the same values for X and Y, the bins aren't square !
plt.hist2d(x, y, bins=(300, 30), cmap=plt.cm.jet)
#plt.show()
And I only replace them x with my x, and y also, but it doesn't work.
I expect the output (density plot) to look like this: