0

I have a histogram created from a pandas dataframe that I would like to plot a vertical dashed line representing the mean of the dataset. I have reviewed this thread, which is exactly the style I am looking for, however, I cannot figure out how to make it work with my code (below):

import pandas as pd
import matplotlib.pyplot as plt

#import csv file into pandas dataframe
df = pd.read_csv('/path/to/my/file')

#calculating mean
m = df.mean()
#print(m)

#plotting histogram 
df.plot(kind='hist')
#plt.axvline(m, color = 'r', linestyle = 'dashed', linewidth = 2)

I end up receiving this error:

 ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Not sure what that means, any help would be appreciated.

EDIT: My datafile is a csv with one column, first row is a header (string) and all subsequent 107 rows are values ranging from app. 1.0E+11 to 4.0E+11


Fake data (Python 2.7)

import io
import numpy as np
a = np.linspace(1, 4, num = 20)
s = 'E11\n'.join(map(str, a))
s += 'E11'
#print(s)
df = pd.read_csv(io.BytesIO(s))
Community
  • 1
  • 1
NaN
  • 643
  • 1
  • 8
  • 21

2 Answers2

1

m is a Pandas Series, it has an index and a value - Matplotlib must not know how to handle it.

>>> print m
1.0E11    2.578947e+11
dtype: float64
>>> type(m)
<class 'pandas.core.series.Series'>
>>>

The value of the mean is m[0] or m.values, so:

plt.axvline(m[0], color = 'r', linestyle = 'dashed', linewidth = 2)
#or
plt.axvline(m.values, color = 'r', linestyle = 'dashed', linewidth = 4)
wwii
  • 23,232
  • 7
  • 37
  • 77
-1

I think you should use m.all() instead of m. and them use plt.show() for plt to draw your histogram. so the code will look like this:

#plotting histogram 
# df.plot(kind='hist')
plt.axvline(m.all(), color = 'r', linestyle = 'dashed', linewidth = 2)
plt.show()
ida
  • 1,011
  • 1
  • 9
  • 17