0

Right now I have a dataset of 1206 participants who have each endorsed a certain number of traumatic experiences and a number of symptoms associated with the trauma.

This is part of my dataframe (full dataframe is 1206 rows long):

SubjectID PTSD_Symptom_Sum PTSD_Trauma_Sum
1223 3 5
1224 4 2
1225 2 6
1226 0 3

I have two issues that I am trying to figure out:

  1. I was able to create a scatter plot, but I can't tell from this plot how many participants are in each data point. Is there any easy way to see the number of subjects in each data point?

I used this code to create the scatterplot:

plt.scatter(PTSD['PTSD_Symptom_SUM'], PTSD['PTSD_Trauma_SUM'])
plt.title('Trauma Sum vs. Symptoms')
plt.xlabel('Symptoms')
plt.ylabel('Trauma Sum')

Scatterplot of Trauma Sum by number of symptoms

  1. I haven't been able to successfully produce a list of the number of people endorsing each pair of items (symptoms and trauma number). I am able to run this code to create the counts for the number of people in each category: :
count_sum= PTSD['PTSD_SUM'].value_counts()
count_symptom_sum= PTSD['PTSD_symptom_SUM'].value_counts()

print(count_sum)
print(count_symptom_sum)

Which produces this output:

0    379
1    371
2    248
3    130
4     47
5     17
6     11
8      2
7      1
Name: PTSD_SUM, dtype: int64
0    437
1    418
2    247
3     74
4     23
5      4
6      3
Name: PTSD_symptom_SUM, dtype: int64

Is it possible to alter the code to count the number of people endorsing each pair of items (symptom number and trauma number)? If not, are there any functions that would allow me to do this?

Riley
  • 157
  • 1
  • 7
  • Maybe check out https://stackoverflow.com/questions/32589829/how-to-get-value-counts-for-multiple-columns-at-once-in-pandas-dataframe? It looks like the second answer is what you want. – ddulaney Jan 25 '21 at 17:54

2 Answers2

1

You could create a new dataset with the counts of each pair 'PTSD_SUM', 'PTSD_Symptom_SUM' with:

counts = PTSD.groupby(by=['PTSD_symptom_SUM', 'PTSD_SUM']).size().to_frame('size').reset_index()

and then use Seaborn like this:

import seaborn as sns
sns.scatterplot(data=counts, x="PTSD_symptom_SUM", y="PTSD_SUM", hue="size", size="size")

To obtain something like this:

enter image description here

Juan Pablo
  • 317
  • 2
  • 8
  • Thank you so much! Just one question, in my graph it looked like it was grouped in categories of 60, 120, 180, 240, and 300. Do you know how to change the grouping or get specific numbers? – Riley Jan 25 '21 at 19:10
  • 1
    The sizes are automatically output from the data. There is no (or at least I haven't found it yet) a parameter that allows overwriting them. If the answer was useful for you please leave your vote. I appreciate it! – Juan Pablo Jan 25 '21 at 20:13
0

If I understood properly, your dataframe is:

SubjectID TraumaSum Symptoms
1         1         5
2         3         4
...

So you just need: dataset.groupby(by=['PTSD_SUM', 'PTSD_Symptom_SUM']).count()

This line will return you the count for each unique value

FrancecoMartino
  • 409
  • 2
  • 5
  • I updated the post to show the portion of the dataset. Is there a way to use that to only show the count of each pair? Thank you so much – Riley Jan 25 '21 at 18:20