2

I have a census dataset indexed by state name and county name and want to loop through each row to find the max and min value across all columns labelled as 'population estimate in each year', then subtract these two values. I want the function to return a Pandas Series with index and value.

Here is my current code:

columns_to_keep=[
    'STNAME',
    'CTYNAME',
    'POPESTIMATE2010',
    'POPESTIMATE2011',
    'POPESTIMATE2012',
    'POPESTIMATE2013',
    'POPESTIMATE2014',
    'POPESTIMATE2015' 
]
df=census_df[columns_to_keep]

def answer_seven(lst):
    lst=[df['POPESTIMATE2010'],df['POPESTIMATE2011'],df['POPESTIMATE2012'],
             df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]

    return max(lst)-min(lst)

answer_seven(lst)

error message:

ValueError                                Traceback (most recent call last)
<ipython-input-110-845350b0b5f7> in <module>()
     18     return max(lst)-min(lst)
     19 
---> 20 answer_seven(lst)
     21 

<ipython-input-110-845350b0b5f7> in answer_seven(lst)
     16              df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]
     17 
---> 18     return max(lst)-min(lst)
     19 
     20 answer_seven(lst)

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in __nonzero__(self)
    890         raise ValueError("The truth value of a {0} is ambiguous. "
    891                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892                          .format(self.__class__.__name__))
    893 
    894     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Tokaalmighty
  • 402
  • 4
  • 15
  • Possible duplicate of [Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()](https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o) – Ian Jul 12 '17 at 18:48

3 Answers3

3

Pandas can do this directly:

cols_of_interest = ['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014' , 'POPESTIMATE2015']
df[cols_of_interest].max(axis=1) - df[cols_of_interest].min(axis=1)

The return of this will be a series indexed by the original index of your dataframe and the maximum value for each row minus the minimum value

johnchase
  • 13,155
  • 6
  • 38
  • 64
2

Or consider numpy.ptp for speed:

Range of values (maximum - minimum) along an axis.

np.ptp(df[cols_of_interest].values, axis=1)
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
0

I had trouble with NaN values that I needed to keep and used the following:

x = {}
for col in df_count:
    x[col] = df_count[col].max()- df_count[col].min()
pd.Series(x)
CJW
  • 81
  • 7