0

I have a very huge data set of 18000 players. Every player has a feature Overall and Finishing for example, and I want to make scatter density plot because with "normal" plot I can't where is more and where is less players.

Normal scatter plot code looks like this...

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

DATA_LOCATION = "main_players.csv"
FIRST_ATTRIBUTE = "Finishing"
SECOND_ATTRIBUTE = "Overall"

datas = pd.read_csv(DATA_LOCATION)
x = datas[[FIRST_ATTRIBUTE]]
y = datas[[SECOND_ATTRIBUTE]]
plt.scatter(x, y, color='r')
plt.xlabel('Finishing')
plt.ylabel('Overall')
plt.title('Odnos overall i finishinga')

plt.show()

I look on the Internet and I find a code to do the following:

# libraries
import matplotlib.pyplot as plt
import numpy as np

# create data
x = np.random.normal(size=50000)
y = x * 3 + np.random.normal(size=50000)

# Big bins
plt.hist2d(x, y, bins=(50, 50), cmap=plt.cm.jet)
#plt.show()

# Small bins
plt.hist2d(x, y, bins=(300, 300), cmap=plt.cm.jet)
#plt.show()

# If you do not set the same values for X and Y, the bins aren't square !
plt.hist2d(x, y, bins=(300, 30), cmap=plt.cm.jet)

#plt.show()

And I only replace them x with my x, and y also, but it doesn't work.

I expect the output (density plot) to look like this:

enter image description here

Nazim Kerimbekov
  • 4,712
  • 8
  • 34
  • 58
josf
  • 3
  • 9
  • How does your scatterplot look now? Without the data, we cannot reproduce the current behaviour ad see how it differs from the expected result. – Valentino Apr 16 '19 at 10:51
  • Possible duplicate of [How to plot a density map in python?](https://stackoverflow.com/questions/24119920/how-to-plot-a-density-map-in-python) – alec_djinn Apr 16 '19 at 11:04
  • @alec_djinn This Not a duplicate of this question. OP needs a to plot a 2D histogram of (x, y) pairs. The question you mentioned is about plotting an (x, y, z) image with `imshow`. – Keldorn Apr 16 '19 at 11:16
  • OP, can you show your attempt at using the `hist2d` function? – Keldorn Apr 16 '19 at 11:18
  • @Keldorn the output suggested by the OP looks like the solution given by one of the answers. It was obtained using plt.pcolormesh – alec_djinn Apr 16 '19 at 11:19

1 Answers1

0

Querying a dataframe with a list of column names, like in your code:

x = datas[[FIRST_ATTRIBUTE]]
y = datas[[SECOND_ATTRIBUTE]]

yields pd.Dataframes, which plt.hist2d cannot deal with

try:

x = datas[FIRST_ATTRIBUTE]
y = datas[SECOND_ATTRIBUTE]

to get pd.Series. These you should be able to plot with plt.hist2d

warped
  • 8,947
  • 3
  • 22
  • 49