0

I'm importing a table that looks like this into my script:

ENSG input_test input_test2 input_test3 ip_test ip_test2 ip_test3
ENSG00000000003.15 1.0 1.0 1.0 3.0 3.0 3.0
ENSG00000000457.14 2.0 2.0 2.0 1.0 1.0 1.0
ENSG00000000460.17 2.0 2.0 2.0 3.0 3.0 3.0

I've been trying to make a violin plot with the code below. I'm able to generate the plot, but I'm also wanting to remove outliers. I'm able to remove outliers by specifying showfliers=False with sns.boxplot, but this doesn't seem to work with sns.violinplot.

#!/usr/bin/env python

"""
Usage: Run script in ~/snakemake_eclip/scripts, use help function to see which parameters are needed.

This script takes in the all_reads_matrix made by merge_matrix.py and creates a violin plot.
"""


import pandas as pd
import argparse
import matplotlib.pyplot as plt
import os
import seaborn as sns
plt.switch_backend('agg')

def make_violin(in_matrix, save_path):

    df = pd.read_csv(str(in_matrix), index_col=False)

    # drop zeros

    df = df[(df != 0).all(1)]

    plt.figure(figsize=(20, 10), dpi=300)

    # patch_artist=True

    sns.violinplot(data=df, showfliers=False)

    plt.plot()

    plt.title("Read Counts of Individual ENSG")
    plt.xlabel("Samples")
    plt.ylabel("Read Count")

    plt.savefig(os.path.join(str(save_path), 'all_reads_matrix_boxplot.pdf'))

if __name__ == '__main__':

    parser = argparse.ArgumentParser(description='Create a violin plot from all_reads_matrix.csv')

    parser.add_argument("--in_matrix",
                        help='name of input matrix')

    parser.add_argument("--save_path",
                        help='path to save')

    # parse out arguments

    args = parser.parse_args()

    # mutate matrix columns

    make_violin(args.in_matrix, args.save_path)

Plot looks like this:

violin plot

  • How do you define a outlier? – jfaccioni Sep 28 '21 at 18:32
  • Does it matter that you're making a violin plot? Could this question be boiled down to, "how do I remove outliers from a dataset"? – Paul H Sep 28 '21 at 18:32
  • I'm attempting to automate this process for a pipeline I'm designing, so the data will be different each time. I won't be able to manually select for outliers, so I'm looking for something that can do what showfliers=False does for sns.boxplot. To answer your question, my supervisor wants a violin plot (I personally think the boxplot here is a better idea). – vegal35866 Sep 28 '21 at 18:41
  • There are multiple ways to define outliers. You need to decide how you want to define an outlier, then filter the dataframe by that condition prior to plotting the data. Using `showfliers=False` simply delegates the decision of "what is the definition of an outlier" to seaborn. – jfaccioni Sep 28 '21 at 18:47

0 Answers0