1

I am working with the titanic data-set. For visualizing the distribution of data I am using seaborn plotting methods. But I am not able to understand the arguments of distplot and its final output what it gives. I want to know the use of the arguments (parameters) used in the following lines especially the use of bins and axes[0] and kde = False.

ax = sns.distplot(women[women['Survived']==1].Age.dropna(), bins=18, 
                  label = survived, ax = axes[0], kde =False)

ax = sns.distplot(women[women['Survived']==0].Age.dropna(), bins=40, 
                  label = not_survived, ax = axes[0], kde =False)

Graph

I have already searched for distplot in the documentation and surfed the net, but nothing is written clearly.

nbro
  • 15,395
  • 32
  • 113
  • 196
Animesh Jaiswal
  • 331
  • 3
  • 7

2 Answers2

2
  1. axes[0]

Based on your code, I assume axes should be a list of Axes objects and axes[0] means you access the first object in the list. When you use ax=axes[0] means you want your plot to be on the left side. Please see this helpful post.

  1. kde=False

By default, seaborn plots both kernel density estimation and histogram, kde=False means you want to hide it and only display the histogram.

  1. bins

Statistically speaking, a histogram is a non-parametric estimation and its shape reflects the distribution of your data. The number of bins will affect the shape. Thus, you should not just randomly pick a bin number if you want your plot to represent your data distribution. The most common way to decide the appropriate number of bins is to use Freedman–Diaconis rule, which is also the default setting in .distplot(). In other words, when you use the .distplot() function to show data distribution, it is better to not specify the bin argument.

steven
  • 2,130
  • 19
  • 38
1

First, we try to understand what is distplot? Distplot is a function of seaborn python library. Which is represented like this: sns.seaborn().

It uses to plot the seaborn histogram.

Now, In your mind may be questioning arrived, Why I will plot a histogram. The histogram helps to visualize the numeric type dataset in bars.

On y-axis give the numeric dataset as you have given "women['Survived']==1" and [women['Survived']==0]

On x-axis gives bins. It means distribute given dataset in a particular range and show in bars as you have given bins= 18 and bins = 40 enter image description here

Now, I am showing the syntax of seaborn sns.distplot()

Syntax: sns.distplot(
                                     a,
                                     bins=None,
                                     hist=True,
                                     kde=True,
                                     rug=False,
                                     fit=None,
                                     hist_kws=None,
                                     kde_kws=None,
                                     rug_kws=None,
                                     fit_kws=None,
                                     color=None,
                                     vertical=False,
                                     norm_hist=False,
                                     axlabel=None,
                                     label=None,
                                     ax=None,
                                    )

Using above parameters, you can plot histogram very well follow this great tutorial to draw seaborn histogram using sns.distplot

Rudra Mohan
  • 730
  • 7
  • 6