Get Box Plot From Redundant Rows

Question

I have a dataframe like below:

import pandas as pd
import numpy as np
df = pd.DataFrame({'id': [12124,12124,5687,5687,7892], 
                   'A': [np.nan,np.nan,3.05,3.05,np.nan],'B':[1.05,1.05,np.nan,np.nan,np.nan],'C':[np.nan,np.nan,np.nan,np.nan,np.nan],'D':[np.nan,np.nan,np.nan,np.nan,7.09]})

I want to get box plot of columns A, B, C, and D, where the redundant row values in each column needs to be counted once only. How do I accomplish that?

Sorry, your question is a bit unclear. Please see how to ask a good pandas question here: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples For a boxplot, you will need to include more copy and pasteable rows of data (no images) in your question. You can't create a good boxplot from just a rows of data — David Erickson, Jun 27 '21 at 22:25
`df.drop_duplicates().boxplot(column=['A', 'B', 'C', 'D'])` ?? — Henry Ecker, Jun 27 '21 at 22:47
In this case, id and the individual columns needs to be combined as one composite key. Because the same value 3.05 in Column A can appear on column A again. — Tranquil Oshan, Jun 27 '21 at 22:50
`drop_duplicates` without a subset will only drop rows if the entire row matches. — Henry Ecker, Jun 27 '21 at 22:51

score 1 · Answer 1 · answered Jun 28 '21 at 03:56

Because panda can only deal with the dataFrame that every column has same length as well as every row has same length. In other words, only frame-shape data could be process. If null values need to be counted only once, it may conflict the principles of "panda" package. Here is my suggestion: you could transform the dataframe into list . The detailed code of transforming the dataFrame into list Then you could try to plot the box plot from the list data and index column.

Get Box Plot From Redundant Rows

1 Answers1