Chart to show relationships - multiple nonnumerical columns

Question

I am currently doing a research project and using data showing individuals who have been referred for being victims of modern slavery.

My original plan was just to do simple bar charts for each of the headers with counts. However I need to show the relationships (for example most of the child referrals are British females, for sexual exploitation). I have played around with scatter graphs but can't get my head around it when using non-numerical data. Ive used 'group-by' to get a count but then do not know what to do with the counts in relation to graphs!

here is a df i made - the actual data is much larger

test = {'Gender': ['Female','Female','Male','Male', 'Female'],
    'Age': ['Adult', 'Adult', 'Child', 'Child', 'Adult'],
    'Nationality': ['British', 'British', 'Vietnamese', 'Albanian', 'British'],
    'Type': ['Sexual', 'Sexual', 'Sexual', 'Labour', 'Criminal'],
    }

df = pd.DataFrame(test, columns = ['Gender', 'Age', 'Nationality','Type',])

dfcount=df.groupby(["Gender", "Age", "Nationality", "Type"]).Age.count().reset_index(name="count")

Please, share data in `text` format and not photo in order to easily reproduce, thanks! — Alexandre B., Apr 21 '20 at 09:44
https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples?noredirect=1&lq=1 — Arne, Apr 21 '20 at 13:00

score 0 · Accepted Answer · answered Apr 21 '20 at 17:00

Since you have so many categorical variables to display, a good solution would be to use facet plots. This means you create multi-plot grids, where every subplot is of the same kind, but applied to different subsets of the data, as defined by the values of certain categorical variables. The seaborn library is great for this purpose.

For example, to get a grid of four subplots according to each possible combination of Gender and Age, with each plot showing the Type on the x-axis and counts on the y-axis, with points colored according to Nationality, you could do this (assuming you have an additional column Count in your DataFrame that contains the actual numbers):

import seaborn as sns

grid = sns.catplot(x='Type', y='Count', 
                   row='Gender', col='Age', 
                   hue='Nationality',
                   data=df, kind='point')

For customization and other options, see the seaborn documentation for sns.catplot() and sns.pointplot().

Chart to show relationships - multiple nonnumerical columns

1 Answers1