Python: How to find the number of items in each point on scatterplot and produce list?

Question

Right now I have a dataset of 1206 participants who have each endorsed a certain number of traumatic experiences and a number of symptoms associated with the trauma.

This is part of my dataframe (full dataframe is 1206 rows long):

SubjectID	PTSD_Symptom_Sum	PTSD_Trauma_Sum
1223	3	5
1224	4	2
1225	2	6
1226	0	3

I have two issues that I am trying to figure out:

I was able to create a scatter plot, but I can't tell from this plot how many participants are in each data point. Is there any easy way to see the number of subjects in each data point?

I used this code to create the scatterplot:

plt.scatter(PTSD['PTSD_Symptom_SUM'], PTSD['PTSD_Trauma_SUM'])
plt.title('Trauma Sum vs. Symptoms')
plt.xlabel('Symptoms')
plt.ylabel('Trauma Sum')

I haven't been able to successfully produce a list of the number of people endorsing each pair of items (symptoms and trauma number). I am able to run this code to create the counts for the number of people in each category: :

count_sum= PTSD['PTSD_SUM'].value_counts()
count_symptom_sum= PTSD['PTSD_symptom_SUM'].value_counts()

print(count_sum)
print(count_symptom_sum)

Which produces this output:

0    379
1    371
2    248
3    130
4     47
5     17
6     11
8      2
7      1
Name: PTSD_SUM, dtype: int64
0    437
1    418
2    247
3     74
4     23
5      4
6      3
Name: PTSD_symptom_SUM, dtype: int64

Is it possible to alter the code to count the number of people endorsing each pair of items (symptom number and trauma number)? If not, are there any functions that would allow me to do this?

Maybe check out https://stackoverflow.com/questions/32589829/how-to-get-value-counts-for-multiple-columns-at-once-in-pandas-dataframe? It looks like the second answer is what you want. — ddulaney, Jan 25 '21 at 17:54

score 1 · Accepted Answer · answered Jan 25 '21 at 18:42

1

You could create a new dataset with the counts of each pair 'PTSD_SUM', 'PTSD_Symptom_SUM' with:

counts = PTSD.groupby(by=['PTSD_symptom_SUM', 'PTSD_SUM']).size().to_frame('size').reset_index()

and then use Seaborn like this:

import seaborn as sns
sns.scatterplot(data=counts, x="PTSD_symptom_SUM", y="PTSD_SUM", hue="size", size="size")

To obtain something like this:

answered Jan 25 '21 at 18:42

Juan Pablo

317
2
8

Thank you so much! Just one question, in my graph it looked like it was grouped in categories of 60, 120, 180, 240, and 300. Do you know how to change the grouping or get specific numbers? – Riley Jan 25 '21 at 19:10
1

The sizes are automatically output from the data. There is no (or at least I haven't found it yet) a parameter that allows overwriting them. If the answer was useful for you please leave your vote. I appreciate it! – Juan Pablo Jan 25 '21 at 20:13

score 0 · Answer 2 · answered Jan 25 '21 at 18:01

0

If I understood properly, your dataframe is:

SubjectID TraumaSum Symptoms
1         1         5
2         3         4
...

So you just need: dataset.groupby(by=['PTSD_SUM', 'PTSD_Symptom_SUM']).count()

This line will return you the count for each unique value

answered Jan 25 '21 at 18:01

FrancecoMartino

409
2
5

I updated the post to show the portion of the dataset. Is there a way to use that to only show the count of each pair? Thank you so much – Riley Jan 25 '21 at 18:20

Python: How to find the number of items in each point on scatterplot and produce list?

2 Answers2