0

I wanted to do some EDA on the data set of my ML project. I'm trying to make a distplot out of the DataFrame X_train and column business_year.

MyCode:

 sns.distplot(X_train['buisness_year'])

I'm supposed to get 2019 and 2020 as the two x-axis values but instead I'm getting this:

enter image description here

The buisness_year column:

7106     2019.0
11451    2019.0
39750    2019.0
27629    2020.0
20892    2019.0
          ...  
1710     2019.0
18852    2019.0
6540     2019.0
35400    2019.0
6027     2019.0
Name: buisness_year, Length: 23494, dtype: float64

How do I fix so that I get business years in the x-axis?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Glenn Ho
  • 15
  • 6
  • It is actually showing values for 2019 and 2020. Only that it is converted to scientific format. i.e., you can see +2.02e3. So 2 spikes which you see are for 2019 and 2020. – Manjunath K Mayya Feb 21 '22 at 15:00
  • You could use a `sns.countplot(X_train['buisness_year'])`, which makes more sense than a distribution. – JohanC Feb 21 '22 at 15:52

1 Answers1

0

For my sample values, I see years clearly as 2017, 2018,2019 etc. In your case, it is converted to scientific format. Hence could you try the below code, which sets the scientific conversion flag to false.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#YOUR CODE, WHICH NEEDS TO GO IN
ax = plt.gca()
ax.get_xaxis().get_major_formatter().set_scientific(False)

sns.distplot(X_train['buisness_year'])
plt.show()    
Manjunath K Mayya
  • 1,078
  • 1
  • 11
  • 20