0

Please be gentle as this is my first post and I'm really new to python/pandas, etc. What I'm trying to do is leverage python and seaborn/matplotlib to assist with data analysis via visualizations. The data I'm working with is a survey with multiple questions, 4 categories of responses (a-d), name of respondent, and a score (4-10).

The goal is to break each question into a separate graph row and each one is a graph of the response (A-D) with the x being the respondent and y being the score.

I can get the base factor plot working, however I'm having difficulty getting the formatting correct (if it's possible). What I would like to do is:

1) For each x-axis of the row, only display the name of respondent for that specific response. I tried setting sharex=False but that didn't seem to work. For example, the first chart Q1, Response A should only show 3 names, not all.

Bonus) if there were someway to get the score# within each bar, that would be awesome as well!

FIXED 2) Append the mean of the question and response to each chart. Currently I was able to hard code it to the last chart, but would like it in every chart-

Thanks in advance. The code I'm current using is noted below

#import modules
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

#data

question = ['Q1', 'Q2', 'Q3', 'Q1', 'Q2', 'Q3', 
            'Q1', 'Q2', 'Q3', 'Q1', 'Q2', 'Q3', 
            'Q1', 'Q2', 'Q3', 'Q1', 'Q2', 'Q3', 
            'Q1', 'Q2', 'Q3', 'Q1', 'Q2', 'Q3', 
           'Q1', 'Q2', 'Q3', 'Q1', 'Q2', 'Q3']
response = ['A', 'C', 'D', 'D', 'D', 'C', 'B',
            'A', 'C', 'C', 'C', 'C', 'C', 'C', 
            'C', 'C', 'A', 'D', 'A', 'A', 'C', 
            'D', 'C', 'A', 'B', 'B', 'B', 'A', 
            'A', 'A']
name = ['name1', 'name1', 'name1', 'name2', 'name2', 'name2', 'name3',
        'name3', 'name3', 'name4', 'name4', 'name4', 'name5', 'name5', 
        'name5', 'name6', 'name6', 'name6', 'name7', 'name7', 'name7',
        'name8', 'name8', 'name8', 'name9', 'name9', 'name9', 'name10',
        'name10', 'name10']
score = [6, 6, 5, 10, 9, 10, 4, 5, 8, 9, 6, 7, 9, 10,
         5, 4, 6, 10, 10, 6, 6, 5, 8, 9, 9, 6, 4, 10, 7, 4]

data = pd.DataFrame()
data['question'] = question
data['response'] = response
data['name'] = name
data['score'] = score

#set up questions to loop through
question = ['Q1','Q2','Q3']

#calculate mean of combination of question/response and export to dictionary
grouped = data.groupby(['question','response']).mean()
d = grouped.to_dict()

#iterate through each question and create factorplots
for i in question:
    p = data[data['question']==i]
    g = sns.factorplot(x='name',y='score', data=p, kind='bar', 
                   col='response', col_order = ['A','B','C','D'],
                   col_wrap=4, sharey=False)

    for j,ax in enumerate(g.axes.flat):
        if j == 0:
            ax.axhline(y=d['score'][i,'A'], c='r', ls='dashed')
        elif j == 1:
            ax.axhline(y=d['score'][i,'B'], c='r', ls='dashed')
        elif j == 2:
            ax.axhline(y=d['score'][i,'C'], c='r', ls='dashed')
        else:
            ax.axhline(y=d['score'][i,'D'], c='r', ls='dashed')

example output - https://i.stack.imgur.com/ZEmIT.jpg

sample data / format is as follows - https://i.stack.imgur.com/Yh4u1.png

  • you should make your example reproducible. No one else has your spreadsheet, so you need to mock up your DataFrame within your questions. http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Paul H Nov 01 '16 at 21:46
  • Paul H - Thanks for the suggestion! I removed the excel import and added the data I'm using so that it should be reproducible now. – user7096854 Nov 02 '16 at 01:44
  • Ok... had a breakthrough and was able to fix the code to address #2 through adding a new for loop to enumerate and add the mean on each specific axis. – user7096854 Nov 05 '16 at 03:50

0 Answers0