Subtract the minimum value from the maximum value across each row, Python Pandas DataFrame

Question

I have a census dataset indexed by state name and county name and want to loop through each row to find the max and min value across all columns labelled as 'population estimate in each year', then subtract these two values. I want the function to return a Pandas Series with index and value.

Here is my current code:

columns_to_keep=[
    'STNAME',
    'CTYNAME',
    'POPESTIMATE2010',
    'POPESTIMATE2011',
    'POPESTIMATE2012',
    'POPESTIMATE2013',
    'POPESTIMATE2014',
    'POPESTIMATE2015' 
]
df=census_df[columns_to_keep]

def answer_seven(lst):
    lst=[df['POPESTIMATE2010'],df['POPESTIMATE2011'],df['POPESTIMATE2012'],
             df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]

    return max(lst)-min(lst)

answer_seven(lst)

error message:

ValueError                                Traceback (most recent call last)
<ipython-input-110-845350b0b5f7> in <module>()
     18     return max(lst)-min(lst)
     19 
---> 20 answer_seven(lst)
     21 

<ipython-input-110-845350b0b5f7> in answer_seven(lst)
     16              df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]
     17 
---> 18     return max(lst)-min(lst)
     19 
     20 answer_seven(lst)

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in __nonzero__(self)
    890         raise ValueError("The truth value of a {0} is ambiguous. "
    891                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892                          .format(self.__class__.__name__))
    893 
    894     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Possible duplicate of [Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()](https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o) — Ian, Jul 12 '17 at 18:48

score 3 · Answer 1 · answered Jul 12 '17 at 18:48

Pandas can do this directly:

cols_of_interest = ['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014' , 'POPESTIMATE2015']
df[cols_of_interest].max(axis=1) - df[cols_of_interest].min(axis=1)

The return of this will be a series indexed by the original index of your dataframe and the maximum value for each row minus the minimum value

Brad Solomon · Accepted Answer · 2017-07-12T18:55:02.293

2

Or consider numpy.ptp for speed:

Range of values (maximum - minimum) along an axis.

np.ptp(df[cols_of_interest].values, axis=1)

edited Jul 12 '17 at 18:55

answered Jul 12 '17 at 18:43

Brad Solomon

38,521
31
149
235

How to also return the index label using this code? – Tokaalmighty Jul 12 '17 at 20:16
You could create a Series, `pd.Series(np.ptp(df[cols_of_interest].values, axis=1), index=df.index)`. – Brad Solomon Jul 12 '17 at 20:17
I was able to find the county with the maximum population difference across the years, but dtype is int64. How do I have it return only the county name (index) as a string? – Tokaalmighty Jul 13 '17 at 17:06

score 0 · Answer 3 · answered Nov 20 '18 at 09:31

0

I had trouble with NaN values that I needed to keep and used the following:

x = {}
for col in df_count:
    x[col] = df_count[col].max()- df_count[col].min()
pd.Series(x)

answered Nov 20 '18 at 09:31

CJW

81
7

Subtract the minimum value from the maximum value across each row, Python Pandas DataFrame

3 Answers3