I have a dataframe including a variable (t_seg_size) and I want to segment that variable into even segments e.g. 0-1000000, 1000001-2000000 etc.) and then generate summary statistics for each segment.
The method I'm using is to iterate over the dataframe in chunks of the appropriate size, then generate the stats such as .std().
Here is the code:
for x in range (1000000, 200000000, 1000000):
print(df3[(x-999999 < df3["t_seg_size"] < x)].t_seg_size.std())
So the loop should look for t_seg_size between (1) and (1000000) and generate the standard deviation. However, I receive the following error:
ValueError Traceback (most recent call last)
<ipython-input-65-ee3e9911be81> in <module>()
2 #df3[df3["t_seg_size"] > 2000000].describe()
3 for x in range (1000000, 200000000, 1000000):
----> 4 print(df3[(1000000 < df3["t_seg_size"] < x)].t_seg_size.std())
C:\Users\xxxx\AppData\Local\Continuum\Anaconda3\lib\site- packages\pandas\core\generic.py in __nonzero__(self)
696 raise ValueError("The truth value of a {0} is ambiguous. "
697 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 698 .format(self.__class__.__name__))
699
700 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Any help would be greatly appreciated.