Please be gentle as this is my first post and I'm really new to python/pandas, etc. What I'm trying to do is leverage python and seaborn/matplotlib to assist with data analysis via visualizations. The data I'm working with is a survey with multiple questions, 4 categories of responses (a-d), name of respondent, and a score (4-10).
The goal is to break each question into a separate graph row and each one is a graph of the response (A-D) with the x being the respondent and y being the score.
I can get the base factor plot working, however I'm having difficulty getting the formatting correct (if it's possible). What I would like to do is:
1) For each x-axis of the row, only display the name of respondent for that specific response. I tried setting sharex=False but that didn't seem to work. For example, the first chart Q1, Response A should only show 3 names, not all.
Bonus) if there were someway to get the score# within each bar, that would be awesome as well!
FIXED 2) Append the mean of the question and response to each chart. Currently I was able to hard code it to the last chart, but would like it in every chart-
Thanks in advance. The code I'm current using is noted below
#import modules
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
#data
question = ['Q1', 'Q2', 'Q3', 'Q1', 'Q2', 'Q3',
'Q1', 'Q2', 'Q3', 'Q1', 'Q2', 'Q3',
'Q1', 'Q2', 'Q3', 'Q1', 'Q2', 'Q3',
'Q1', 'Q2', 'Q3', 'Q1', 'Q2', 'Q3',
'Q1', 'Q2', 'Q3', 'Q1', 'Q2', 'Q3']
response = ['A', 'C', 'D', 'D', 'D', 'C', 'B',
'A', 'C', 'C', 'C', 'C', 'C', 'C',
'C', 'C', 'A', 'D', 'A', 'A', 'C',
'D', 'C', 'A', 'B', 'B', 'B', 'A',
'A', 'A']
name = ['name1', 'name1', 'name1', 'name2', 'name2', 'name2', 'name3',
'name3', 'name3', 'name4', 'name4', 'name4', 'name5', 'name5',
'name5', 'name6', 'name6', 'name6', 'name7', 'name7', 'name7',
'name8', 'name8', 'name8', 'name9', 'name9', 'name9', 'name10',
'name10', 'name10']
score = [6, 6, 5, 10, 9, 10, 4, 5, 8, 9, 6, 7, 9, 10,
5, 4, 6, 10, 10, 6, 6, 5, 8, 9, 9, 6, 4, 10, 7, 4]
data = pd.DataFrame()
data['question'] = question
data['response'] = response
data['name'] = name
data['score'] = score
#set up questions to loop through
question = ['Q1','Q2','Q3']
#calculate mean of combination of question/response and export to dictionary
grouped = data.groupby(['question','response']).mean()
d = grouped.to_dict()
#iterate through each question and create factorplots
for i in question:
p = data[data['question']==i]
g = sns.factorplot(x='name',y='score', data=p, kind='bar',
col='response', col_order = ['A','B','C','D'],
col_wrap=4, sharey=False)
for j,ax in enumerate(g.axes.flat):
if j == 0:
ax.axhline(y=d['score'][i,'A'], c='r', ls='dashed')
elif j == 1:
ax.axhline(y=d['score'][i,'B'], c='r', ls='dashed')
elif j == 2:
ax.axhline(y=d['score'][i,'C'], c='r', ls='dashed')
else:
ax.axhline(y=d['score'][i,'D'], c='r', ls='dashed')
example output - https://i.stack.imgur.com/ZEmIT.jpg
sample data / format is as follows - https://i.stack.imgur.com/Yh4u1.png