For the dataframe below,I want to write a function that
I. extract the outliers for each column and export the output as a csv file (I need help with this one)
II. visualize using boxplot and export as pdf file
Outlier definition: as boundaries ±3 standard deviations from the mean
OR
as being any point of data that lies over 1.5 IQRs below the first quartile (Q1) or above the third quartile (Q3)in a data set. High = (Q3) + 1.5 IQR Low = (Q1) – 1.5 IQR
See below for the dataset and my attempt :
# dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# intialise data of lists.
data = {'region':['R1', 'R1', 'R2', 'R2', 'R2','R1','R1','R1','R2','R2'],
'cost':[120.05, 181.90, 10.21, 133.01, 311.19,2003.4,112.4,763.2,414.8,812.5],
'commission':[110.21, 191.12, 190.21,15.31, 245.09,63.41,811.3,10.34, 153.10, 311.17],
'salary':[10022,19910, 19113,449999, 25519,140.29, 291.07, 390.22, 245.09, 4122.62],
'revenue':[14029, 29100, 39022, 24509, 412271,110.21, 191.12, 190.21, 12.00, 245.09],
'tax':[120.05, 181.90, 10.34, 153.10, 311.17,52119,32991,52883,69359,57835],
'debt':[100.22,199.10, 191.13,199.99, 255.19,41218, 52991,1021,69152,79355],
'income': [43211,7672991,56881,211,77342,100.22,199.10, 191.13,199.99, 255.19],
'rebate': [31.21,429.01,538.18,621.58,6932.5,120.05, 181.90, 10.34, 153.10, 311.17],
'scale':['small','small','small','mid','mid','large','large','mid','large','small']
}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
df
############## my attempt ####################
def outlier_extractor(data):
# select numeric columns
numeric_columns = data.select_dtypes(include=np.number).columns.tolist()
#(I)Extract and export outliers as csv..... I need help with this one
#(II) boxplot visualization
plt.figure(figsize=(10, 9))
for i, variable in enumerate(numeric_columns):
plt.subplot(4, 4, i + 1)
plt.boxplot(data[variable],whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.savefig('graph_outliers.pdf')
plt.show()
# driver code
outlier_extractor(df)
Please comment and share your full code. Thanks in advance