1

I've created a function for plotting the data (see below) using a FacetGrid.

def barplots(data, col, hue, x, y):

    sns.set_style(style="darkgrid")
    sns.set_context("paper", font_scale=2)
    g = sns.FacetGrid(
        data,
        col=col,
        hue=hue,
        palette="tab20c",
        legend_out=False,
        col_wrap=5,
        height=15,
    )
    g.map(sns.catplot, x=x, y=y)
    plt.show()


col = "military_civilian"
hue = "sex"
y = "age_at_selection"
x = "nationality_2"
data = nationality_astronauts

barplots(data, col, hue, x, y)

I keep getting ValueError: Could not interpret input 'nationality_2'

Can anybody figure out what's happening here?

nationality_2 military_civilian sex age_at_selection age_at_selection age_at_selection hours_mission eva_hrs_mission
youngest_selected oldest_selected average_age_selected total_eva_hrs_mission total_eva_hrs_mission
Canada civilian female 29 38 32 805.75 0
Canada civilian male 30 50 37.57142857 11036.93 24.28
Canada military male 29 34 32.375 5410.02 22.01
China military female 34 34 34 303.5 0
China military male 32 45 40.15384615 3662 0.26
France civilian female 28 28 28 614.4 0
France civilian male 31 36 33.5 5127.63 13
France military male 27 42 34.28571429 9351.91 31.79
Germany civilian male 33 42 35.63636364 11584.06 12.97
Germany military male 34 39 36.4 8953.1 14.25
Italy civilian male 42 52 45.33333333 854.42 0
Italy military female 32 32 32 4783.5 0
Italy military male 33 41 36 17037.25 26.88
Japan civilian female 29 33 31.66666667 930.8 0
Japan civilian male 29 47 32.8125 32299.35 60.11
Japan military male 39 39 39 3400 0
Rest of world civilian female 26 28 27 450.22 0
Rest of world civilian male 25 46 34.9375 15783.61 105.8
Rest of world military male 27 42 34.64705882 17785.96 4.72
U.S. civilian female 26 47 32.34065934 77986.735 180.33
U.S. civilian male 25 60 35.41832669 142271.82 1266.607
U.S. military female 32 36 33.17647059 28430.5 105.42
U.S. military male 26 53 34.63807531 257079.295 1440.23
U.S.S.R/Russia civilian female 30 32 31.6 8767 3.58
U.S.S.R/Russia civilian male 25 48 33.325 227418.79 429.52
U.S.S.R/Russia military female 25 25 25 70.83 0
U.S.S.R/Russia military male 23 45 30.30481283 449779.468 933.707
johnadem
  • 153
  • 2
  • 12
  • Your source table has **3** columns named *age_at_selection*. After reading, these columns are given names: *age_at_selection*, *age_at_selection.1* and *age_at_selection.2*. Do you want all these 3 columns to be the source for consecutive rows? And why did you pass *col_wrap=5*? Place also some picture or describe some other way what data should be presented in each plot cell. – Valdi_Bo Jan 29 '21 at 16:41
  • I've tried this but I'm still getting the same error. I just want to plot age_at_selection against nationality_2. I want to set hue='sex' and plot it on grid military_civilian. It seems as if it should be simple to do but the error remains. Not sure why – johnadem Jan 29 '21 at 16:43
  • 1
    [Like this](https://imgur.com/a/VeGkOZ8)? And what do you mean by `plot it on grid military_civilian`? This category has exactly two values. Not much of a grid. – Mr. T Jan 29 '21 at 16:53
  • I'm trying to plot this on a FacetGrid with military_civilian being a column. Hope that's clear – johnadem Jan 29 '21 at 16:59
  • @Mr.T that's what I was looking for, how did you manage that? – johnadem Jan 29 '21 at 17:00
  • @Valdi_Bo thanks for spotting that! I've amended the source table now to what it should be. – johnadem Jan 29 '21 at 17:18
  • 1
    Do you have a **multi-index** on columns? Your description is still unclear. – Valdi_Bo Jan 29 '21 at 17:55
  • @Valdi_Bo that's correct, the columns are multi-indexed. Not sure how to flatten them out to a single index. What I'm trying to achieve is this > https://imgur.com/a/VeGkOZ8 – johnadem Jan 29 '21 at 17:57
  • `g.map(sns.barplot, data=data, x=x, y=y, hue=hue)`, and I removed `col_wrap` and `hue` from `FacetGrid`. However, it was not clear that you have a multi-index dataframe. You may want to read [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Mr. T Jan 29 '21 at 18:12
  • @Mr.T Thanks for the link. That solution doesn't work on the data I have. Maybe it's due to the multi-indexed columns – johnadem Jan 29 '21 at 18:30
  • The link describes how you should provide sample data so that other SO users can reproduce your problem, not how to address your problem. Only if we can reproduce your problem, we can suggest how to resolve it. – Mr. T Jan 29 '21 at 18:37

1 Answers1

1

One of possible solutions:

  1. Change the index on columns to a single level:

    nationality_astronauts.columns = ['nationality_2', 'military_civilian', 'sex',
        'Youngest', 'Oldest', 'Average', 'hours_mission', 'eva_hrs_mission']
    
  2. Generate the source data:

    data = nationality_astronauts.iloc[:, 0:6].melt(id_vars=['nationality_2', 
        'military_civilian', 'sex'], value_vars=['Youngest', 'Oldest', 'Average'],
        var_name='At selection', value_name='Age')
    

    The initial part of data is:

      nationality_2 military_civilian     sex At selection   Age
    0        Canada          civilian  female     Youngest  29.0
    1        Canada          civilian    male     Youngest  30.0
    2        Canada          military    male     Youngest  29.0
    3         China          military  female     Youngest  34.0
    4         China          military    male     Youngest  32.0
    
  3. Create the plot, with separate rows for each At selection, corresponding to source age columns:

    g = sns.catplot(data=data, col='military_civilian', row='At selection',
        hue='sex', height=4, x='nationality_2', y='Age')
    g.set_xticklabels(rotation=90);
    

The result is:

enter image description here

If you want, pass any palette of your choice.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41