1

I have a list (length 300) of lists (each length 1000). I want to sort the list of 300 by the median of each list of 1000, and then plot a seaborn boxplot of the top 10 (i.e. the 10 lists with the greatest median).

I am able to plot the entire list of 300 but don't know where to go from there.

I can plot a range of the points but how to I plot, for example: data[3],data[45], data[129] all in the same plot?

ax = sns.boxplot(data = data[0:50])

I can also work out which items in the list are in the top 10 by doing this (but I realise this is not the most elegant way!)

array_median = np.median(data, axis=1)
np_sortedarray = np.sort(np.array(array_median))

sort_panda = pd.DataFrame(array_median)
TwoL = sort_panda.reset_index()
TwoL.sort_values(0)

Ultimately I want a boxplot with 10 boxes, showing the list items that have the greatest median values.

Example of data: list of 300 x 1000 [[1.236762285232544, 1.2303414344787598, 1.196462631225586, ...1.1787045001983643, 1.1760116815567017, 1.1614983081817627, 1.1546586751937866], [1.1349891424179077, 1.1338907480239868, 1.1239897012710571, 1.1173863410949707, ...1.1015456914901733, 1.1005324125289917, 1.1005228757858276], [1.0945734977722168, ...1.091795563697815]]

egeorgia
  • 41
  • 1
  • 4
  • Can you provide some sample data from your list of lists? In your question, I mean. Preferably the top 10 you want to plot, after you've sorted them based on median. – m13op22 May 31 '19 at 14:44
  • Thanks for responding. This is the problem, I don't know how to sort within the list. I need to keep track of the order each list appears in the list as this is important for my analysis. Any ideas on this? – egeorgia May 31 '19 at 14:52
  • Then can you show some of the data from the entire list? – m13op22 May 31 '19 at 14:59
  • Can do! Whats the best way of sharing? – egeorgia May 31 '19 at 15:10
  • Adding an example of it in your questions, as suggested [here](https://stackoverflow.com/help/minimal-reproducible-example) – m13op22 May 31 '19 at 15:13

2 Answers2

0

See this answer for fetching top 10 elements

idx = (-median).argsort()[:10]
data[idx]

Also, you can get particular elements of data like this

data[[3, 45, 129]]
Kirill Korolev
  • 966
  • 3
  • 9
  • 22
  • Thanks! However, when I try to plot in this this way (selecting particular elements I get this error message: TypeError: list indices must be integers or slices, not list – egeorgia May 31 '19 at 15:12
  • 1
    @egeorgia Apparently you're trying to plot list, not numpy array – Kirill Korolev May 31 '19 at 15:16
0

I modified your example data a bit just to make it easier.

import seaborn as sns
import pandas as pd
import numpy as np

data = [[1.236762285232544, 1.2303414344787598, 1.196462631225586, 1.1787045001983643, 1.1760116815567017, 1.1614983081817627, 1.1546586751937866], 
        [1.1349891424179077, 1.1338907480239868, 1.1239897012710571, 1.1173863410949707, 1.1015456914901733, 1.1005324125289917, 1.1005228757858276]]

To sort your data, since it is in list format and not numpy arrays, you can use the sorted function with a key to tell it to perform an operation on each list in your list, which is how the function will sort. Setting reverse = True tells it to sort highest to lowest.

sorted_data = sorted(data, key = lambda x: np.median(x), reverse = True)

To select the top n lists, add [:n] to the end of the previous statement.

To plot in Seaborn, it's easiest to convert your data to a pandas.DataFrame.

df = pd.DataFrame(data).T

That makes a DataFrame with 10 columns (or 2 in this example). We can rename the columns to make each dataset clearer.

df = df.rename(columns={k: f'Data{k+1}' for k in range(len(sorted_data))}).reset_index()

And to plot 2 (or 10) boxplots in one plot, you can reshape the dataframe to have 2 columns, one for the data and one for the dataset number (ID) (credit here).

df = pd.wide_to_long(df, stubnames = ['Data'], i = 'index', j = 'ID').reset_index()[['ID', 'Data']]

And then you can plot it.

sns.boxplot(x='ID', y = 'Data', data = df)

enter image description here

m13op22
  • 2,168
  • 2
  • 16
  • 35