0

I have 4 dataframes

df1 = pd.DataFrame({'ID': [0, 0, 0, 0, 0, 0],
                    'value': [3.0, 3.5, 4.5, NaN, 7.0, 8.1]})

df2 = pd.DataFrame({'ID': [1, 1, 1, 1, 1, 1],
                    'value': [9.4, NaN, 4.5, 2.4, 4.0, 3.9]})

df3 = pd.DataFrame({'ID': [2, 2, 2],
                    'value': [1.0, 3.9, 4.1]})

df4 = pd.DataFrame({'ID': [3, 3, 3, 3],
                    'value': [NaN, NaN, 5.8, 3.0]})

I want to make a boxplot with values in the column value in each of the dataframe. I did the following

fig, ax2 = plt.subplots()
vec = [df1['value'].values,df2['value'].values,df3['value'].values,df4['value'].values]
labels = ['ID_0','ID_1', 'ID_2', 'ID_3']
ax2.boxplot(vec, labels = labels)
ax2.set_title('Values')
plt.show()

But it doesn't work and throws me an empty plot. Is there a better way to do this?

Traceback

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [3], in <cell line: 1>()
      1 df1 = pd.DataFrame({'ID': [0, 0, 0, 0, 0, 0],
----> 2                     'value': [3.0, 3.5, 4.5, NaN, 7.0, 8.1]})
      4 df2 = pd.DataFrame({'ID': [1, 1, 1, 1, 1, 1],
      5                     'value': [9.4, NaN, 4.5, 2.4, 4.0, 3.9]})
      7 df3 = pd.DataFrame({'ID': [2, 2, 2],
      8                     'value': [1.0, 3.9, 4.1]})

NameError: name 'NaN' is not defined
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
EngGu
  • 459
  • 3
  • 14

1 Answers1

1

To identify NaN, you need to use np.nan (use import numpy as np if required). Also, you need to dropna() before plotting. Making the changes...

df1 = pd.DataFrame({'ID': [0, 0, 0, 0, 0, 0], 'value': [3.0, 3.5, 4.5, np.nan, 7.0, 8.1]}).dropna()
df2 = pd.DataFrame({'ID': [1, 1, 1, 1, 1, 1], 'value': [9.4, np.nan, 4.5, 2.4, 4.0, 3.9]}).dropna()
df3 = pd.DataFrame({'ID': [2, 2, 2], 'value': [1.0, 3.9, 4.1]}).dropna()
df4 = pd.DataFrame({'ID': [3, 3, 3, 3],'value': [np.nan, np.nan, 5.8, 3.0]}).dropna()
fig, ax2 = plt.subplots()
vec = [df1['value'].values,df2['value'].values,df3['value'].values,df4['value'].values]
labels = ['ID_0','ID_1', 'ID_2', 'ID_3']
ax2.boxplot(vec, labels = labels)
ax2.set_title('Values')
plt.show()

gives you...

enter image description here

Redox
  • 9,321
  • 5
  • 9
  • 26