0

I have a dataframe like below:

import pandas as pd
import numpy as np
df = pd.DataFrame({'id': [12124,12124,5687,5687,7892], 
                   'A': [np.nan,np.nan,3.05,3.05,np.nan],'B':[1.05,1.05,np.nan,np.nan,np.nan],'C':[np.nan,np.nan,np.nan,np.nan,np.nan],'D':[np.nan,np.nan,np.nan,np.nan,7.09]})

Table Data

I want to get box plot of columns A, B, C, and D, where the redundant row values in each column needs to be counted once only. How do I accomplish that?

tdy
  • 36,675
  • 19
  • 86
  • 83
  • Sorry, your question is a bit unclear. Please see how to ask a good pandas question here: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples For a boxplot, you will need to include more copy and pasteable rows of data (no images) in your question. You can't create a good boxplot from just a rows of data – David Erickson Jun 27 '21 at 22:25
  • `df.drop_duplicates().boxplot(column=['A', 'B', 'C', 'D'])` ?? – Henry Ecker Jun 27 '21 at 22:47
  • In this case, id and the individual columns needs to be combined as one composite key. Because the same value 3.05 in Column A can appear on column A again. – Tranquil Oshan Jun 27 '21 at 22:50
  • `drop_duplicates` without a subset will only drop rows if the entire row matches. – Henry Ecker Jun 27 '21 at 22:51
  • So that's the solution then @HenryEcker – Tranquil Oshan Jun 27 '21 at 22:59

1 Answers1

1

Because panda can only deal with the dataFrame that every column has same length as well as every row has same length. In other words, only frame-shape data could be process. If null values need to be counted only once, it may conflict the principles of "panda" package. Here is my suggestion: you could transform the dataframe into list . The detailed code of transforming the dataFrame into list Then you could try to plot the box plot from the list data and index column.