0

I am learning how to plot using Seaborn. I have a dataset that has some values ranging from -50K to +50K (some are more than +/- 50K also) and their frequencies (length of strings and their frequencies). I would like to make a plot to show the distribution with different ranges e.g. [<-50K, -5K], [-5K, -500], [-500, 500], [500, 5K], [5k, >50K].

I would like to see the distribution such that different ranges with different bin sizes (as different colors as possible). For example [<-50K, -5K] has 100 bins but [-500, 500] has 200 bins etc.

This is what I have tried so far. Any suggestions would be appreciated on how to make a better plot.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

data = sns.load_dataset("test.csv")
newbins = [-50000, -5000, -500, 500, 5000, 50000, 500]
sns.displot(data, x="LEN",log_scale=(False, True), bins=newbins)

Here is the plot using the above code

I also tried the pd.cut to create histogram fro each range (as shown in figure below), but I would like to have a number of bins for the range instead of one single bar.

df['bin'] = pd.cut(df['LEN'], bins=[-np.inf, -5000, -500, 500, 5000, np.inf])
sns.catplot(data=df, kind='count', x='bin', height=4, aspect=2, log='y')

Histogram based on range

Also, I tried separating the CSV file into 4 different parts and plotting four different plots side by side. Would it be possible to make them one plot instead of four different ones?

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
fig, axs = plt.subplots(ncols=4)
df1 = pd.read_csv('LEN_LT_NEG_10K.csv')
sns.histplot(df1, x="LEN",log_scale=(False, True), bins=50, kde=True, ax=axs[0])
df2 = pd.read_csv('LEN_IN_NEG_10K_50.csv')
sns.histplot(df2, x="LEN",log_scale=(False, True), bins=150, kde=True, ax=axs[1])
df3 = pd.read_csv('LEN_IN_50_10K.csv')
sns.histplot(df3, x="LEN",log_scale=(False, True), bins=150, kde=True, ax=axs[2])
df4 = pd.read_csv('LEN_GT_10K.csv')
sns.histplot(df4, x="LEN",log_scale=(False, True), bins=5, kde=True, ax=axs[3])
plt.savefig('LEN_ALL.png', bbox_inches="tight")

Plot for the above code

  • This is essentially just a count plot. `df['bin'] = pd.cut(df.LEN, bins=[-np.inf, -5000, -500, 500, 5000, np.inf])` and `g = sns.catplot(data=df, kind='count', x='bin', height=4, aspect=2, log='y')` [See plot](https://i.stack.imgur.com/Y0QSf.png). – Trenton McKinney Mar 11 '23 at 17:05

0 Answers0