4

I am trying to generate several violin plots in one, using seaborn. The dataframe I use includes several categorical values in one column (to be used on the x-axis), with an array of values for each categorical value (to be used to create the violin plot for each categorical value). A small working example would be this:

foo = pd.DataFrame(columns =['Names','Values'])
for i in range(10):
    foo.loc[i] = ['no'+str(i),np.random.normal(i,2,10)]

But when trying

sns.violinplot(x='Names', y='Values', data=foo)

I get the following error

ValueError: Neither the x nor y variable appears to be numeric.

Now I could be hacky and just separate the array across several rows as such:

foo = pd.DataFrame(columns =['Names','Values'])
for i in range(3):
    bar = np.random.normal(i,2,10)
    for j,b in enumerate(bar):
        foo.loc[i*10+j] = ['no'+str(i),b]

which yields the plot I want:

enter image description here

But I'm guessing there is a more simple solution to this, without needing to restructure my dataframe.

emilaz
  • 1,722
  • 1
  • 15
  • 31

2 Answers2

3

pd.DataFrame.explode() helps you turn your column of lists into separate cells. After converting them to actual numbers instead of strings sns.violinplot can plot without effort.

foo = foo.explode('Values')
foo['Values'] = foo['Values'].astype('float')
sns.violinplot(data=foo, x='Names', y='Values')

enter image description here

gosuto
  • 5,422
  • 6
  • 36
  • 57
2

In pandas 0.25 you could use explode, for a previous version use any of the solutions here:

result = foo.explode('Values').reset_index(drop=True)
result = result.assign(Names=result['Names'].astype('category'), 
                       Values=result['Values'].astype(np.float32))

sns_plot = sns.violinplot(x='Names', y='Values', data=result)

Output violin plot of categorical data

Exploding (or unnesting) will transform your data into:

   Names     Values
0    no0   3.352148
1    no0   2.195788
2    no0   1.234673
3    no0   0.084360
4    no0   1.778226
..   ...        ...
95   no9  12.385434
96   no9   9.849669
97   no9  11.360196
98   no9   8.535900
99   no9   9.369197

[100 rows x 2 columns]

The assign transforms the dtypes into:

Names     category
Values     float32
dtype: object
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • My data already is in the 'post-explosion' format. My values are times in the HH:mm format. https://stackoverflow.com/questions/52289579/plot-datetime-time-in-seaborn this works for scatterplot but not for violinplot. – Unknow0059 Nov 07 '20 at 14:41