2

I've been googling for an hour or so and haven't found what I am looking for. Here is where I am at in my code.

I used BS to pull the information down and save it to a CSV file. The CSV has x,y coordinates which I can make into a scatterplot.

similar to this (there are about 1,500 datapoints and obviously 100 combinations)

x,y

0,6

1,2

0,7

4,6

9,9

0,0

4,4

1,2

etc.

What I would like to do is make the size of the points on the scatterplot scale with the frequency of how often they appear.

df = pd.read_csv("book8.csv")

df.plot(kind = 'scatter',x='x',y='y')


plt.show()

The arrays are just numbers between 0 and 9. I'd like to make the size scale to how often combinations of 0-9 show up.

I currently just have this, it's not really useful obviously.

https://i.stack.imgur.com/daiXF.jpg

Do I need to set x and y into their own arrays to accomplish this instead of using the dataframe(df)?

pault
  • 41,343
  • 15
  • 107
  • 149
scarebear
  • 157
  • 7
  • 1
    @pault disagree, I think it's a pre-processing issue; it's not trivial to convert that output (a 1D array of sizes) to a 2D array of points to plot. – roganjosh Jan 30 '18 at 22:40
  • 1
    @roganjosh True. I supposed that the difficulty was in figuring out how to change the marker size, rather than in getting the frequencies In any case, [this post](https://stackoverflow.com/questions/14827650/pyplot-scatter-plot-marker-size) may be useful. – pault Jan 30 '18 at 22:43
  • You may want to look into [`hexbin`](https://matplotlib.org/devdocs/api/_as_gen/matplotlib.axes.Axes.hexbin.html). Not quite what you're looking for, but it produces a similar output. – pault Jan 30 '18 at 22:48
  • Have you explored the `bokeh` or `seaborn` libraries? They might have a function for this. – Mr. T Jan 30 '18 at 22:54
  • 1
    https://matplotlib.org/examples/pylab_examples/scatter_demo2.html – f5r5e5d Jan 30 '18 at 23:10
  • This is probably rather a question of getting the counts of tuples than about plotting. This would be answered [here](https://stackoverflow.com/questions/11260770/python-tuple-operations-and-count). – ImportanceOfBeingErnest Jan 31 '18 at 00:40

2 Answers2

5

I'm not sure how I could push this into numpy just yet (I'll keep thinking). In the meantime, a Python solution:

import matplotlib.pyplot as plt
import random
from collections import Counter 

x_vals = [random.randint(0, 10) for x in range(1000)]
y_vals = [random.randint(0, 10) for x in range(1000)]

combos = list(zip(x_vals, y_vals))
weight_counter = Counter(combos)

weights = [weight_counter[(x_vals[i], y_vals[i])] for i, _ in enumerate(x_vals)]

plt.scatter(x_vals, y_vals, s=weights)
plt.show()
roganjosh
  • 12,594
  • 4
  • 29
  • 46
  • I think I can make this solution work. For my reference, it looks like we are getting the number of combos from the arrays. Next, we are assigning the % weight of each combo to the overall. Then setting the size in the scatterplot to the weight of the combo. I was thinking this was the route I was going to have to go, just wasn't sure how to get it going! Thank you! – scarebear Jan 30 '18 at 23:28
  • "Next, we are assigning the % weight of each combo to the overall." No, there is no percentage used at all. This code supplies the raw occurrences of each combination of `(x, y)` to `matplotlib` and leaves it at that. How it figures out how to translate `s` into blob sizes is inside the library. – roganjosh Jan 30 '18 at 23:39
  • Yes! That's what I meant, my mind was on the comment about normalization with percentages. Thank you. – scarebear Jan 30 '18 at 23:42
1

You should plot a Circle where you'd set fill=True.

You will then count the iterations of each combinations, make a % of the circle's radius, or just add some numbers to thé radius per iterations for each different circles.

IMCoins
  • 3,149
  • 1
  • 10
  • 25
  • The issue is not how to change the size of the markers, it's how to do it in accordance with the frequency of the combinations. – roganjosh Jan 30 '18 at 22:46
  • @roganjosh I think, he suggested to increase the radius with each new occurence of a `(x, y)` tuple in the list. The problem I see with this approach that you have no upper limit for this process - one dot might fill the whole screen. You have to know the maximum frequency and normalise all other values accordingly. – Mr. T Jan 30 '18 at 22:50
  • Make a % would be the same as normalizing. – IMCoins Jan 30 '18 at 22:51